summaryrefslogtreecommitdiffstats
path: root/tests/012
Commit message (Collapse)AuthorAgeFilesLines
...
* iter-begin: string range support.Kaz Kylheku2021-08-221-0/+42
| | | | | | | | | | | | | | Ranges like "AAA".."ZZZ" are now possible. * lib.c (seq_iter_get_range_str, seq_iter_peek_range_str, seq_iter_get_rev_range_str): New static functions. (seq_iter_init_with_info): Support string ranges via above new functions. Range direction test is now done with less and equal rather than lt and gt. * tests/012/iter.tl: New file. * txr.1: Documented.
* windows: skip test requiring full Unicode.Kaz Kylheku2021-08-071-0/+3
| | | | | * tests/012/cont.tl: Exit before the test case that contains characters ouside of the BMP, if (sizeof wchar) is less than 4.
* tests: fix undefined variable warning.Kaz Kylheku2021-08-031-1/+3
| | | | | * tests/012/oop.tl: Adjust one recently added test case to eliminate undefined variable warning.
* oop: fix infelicity in new* and lnew* macros.Kaz Kylheku2021-07-311-0/+14
| | | | | | | | | | * stdlib/struct.tl (sys:new-expander): If the argument of new* or lnew* is dwim, then treat that as an expression, rather than as a boa-style construction. * tests/012/oop.tl: Tests for new* focusing on this issue. * txr.1: Documented.
* tests: multiple evaluation issue in amb.Kaz Kylheku2021-07-301-2/+2
| | | | | | | | | | | | | | | | This issue doesn't affect the tests. This is for the benefit of someone who happens to be copy-and-pasting the amb implementation from here. * tests/012/cont.tl (amb): This function has an issue in that it calls the continuation (future calculation) and then if that succeeds, it normally returns the value. This means that the future is executed again. In the case of N amb expressions, the successful future is executed 2**N times. What amb must do is this: call the continuation and capture the value. If the value is successful, then that is the master return value; just return that from amb-scope, bypassing the second re-execution of the future.
* tests: longer test for delimited continuations.Kaz Kylheku2021-07-301-0/+10
| | | | | * tests/012/cont.tl: New test case. This aborts prior to recent gc fixes.
* subtypep: handle struct type objects.Kaz Kylheku2021-07-271-0/+48
| | | | | | | | | | | | | The subtypep function has poor requirements, handling only type symbols. Let's extend it to handle structure type objects. * lib.c (subtypep): In all cases when an argument is considered to be a possible structure symbol, and thus subject to find_struct_type, consider whether it already is a struct type, and just take it as-is. * tests/012/type.tl: New tests. * txr.1: Updated.
* op: set nested flag in correct context.Kaz Kylheku2021-07-191-0/+11
| | | | | | | | | | | | * stdlib/op.tl (sys:op-meta-p): Return an extended Boolean value: a true result is an integer indicating the depth of the variable. For instance @1 is depth 0, @@1 is depth 1 and so on. (sys:find-parent): New function. (sys:op-alpha-rename): When processing a nested meta, do not set the nested flag in the immediate parent. Use find-parent to go up to the correct level to which the meta belongs and set the flag there. * tests/012/op.tl: New test cases which depend on this.
* op: fix bug in do.Kaz Kylheku2021-07-191-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The June 30 09e70c914ca83b5c7405aa633864db49f27efa05, subject "op: refactor do handling", introduced a regression breaking the tags.tl program. An implicit argument gets inserted twice: [[(do op list @1)] 'x] -> (x x) ;; incorrect/weird This was spotted by Paul A. Patience while working on extending tags.tl for Emacs. It's not exactly a regression because the original behavior is not documented or tested, and has issues; we simply cannot roll back the commit; a proper fix is required. How the above call is now supposed to work is that: - the @1 parameter belongs to the op, not to the do. - the do therefore has no explicitly given parameters of its own. - therefore the do inserts its parameter. In other words (do op list @1) is formally equivalent to (do op list @1 @@1). Both levels of function indirection require an argument: [[(do op list @1) 'x] 'y] -> (y x) [[(do op list @1 @@1) 'x] 'y] -> (y x) * stdlib/op.tl (sys:op-ctx): The structure gets a new slot, nested, which is a flag indicating whether unprocessed nested metas occur. This is critically needed because the sys:op-alpha-rename passes which are called with do-nested-metas being false do not insert nested metas into the gens list; they transform them and leave them in the syntax. Yet we must make decisions based on their presence. Conretely, we must be able to tell that (do op list @@1) has a meta against the outer (do ...), while we are just processing the do. (sys:op-alpha-rename): When replacing a nested meta syntax with the macro invocation, we set the nested flag of the parent context true. (sys:op-expand): Bring back the do-gen; we need it. We cannot simply insert @1 into the syntax, because that is not lexically transparent. If we add @1 to (do op ...) then that @1 is interpreted as belonging to the op, not to the do. We must also check the new Boolean flag nested to properly detect whether we have metas, including unexpanded nested metas. * tests/012/op.tl: New test cases combining (do op ...).
* tests: fix stack overflow test case for old gmake.Kaz Kylheku2021-07-141-2/+7
| | | | | | | | | | | | | | | * tests/012/stack.tl: The (if stack-limited ...) test is not correct because even if gerlimit indicates an unlimited stack, we impose a defualt limit, and so (get-stack-limit) returns a an integer value. The idea here was to try to skip this test case when the stack usage is unlmited, which happens under older versions of GNU make, before posix_spawn was introduced. Instead, let's execute this test case only if we have setrlimit. In the forked child, we try to impose a small stack limit that will give use the stack overflow crash we are testing for. The objective of the test case is to validate that when (set-stack-limit 0) is called, the child will abort due to a signal, rather than (recur) returning :so.
* lib: tests for keep-if, remove-if, separate.Kaz Kylheku2021-07-101-0/+42
| | | | * tests/012/seq.tl: New tests.
* subtypep: handle COBJ inheritance.Kaz Kylheku2021-07-091-0/+20
| | | | | | | | | | | | | | | | | | * lib.c (class_from_sym): New static function. (subtypep): Remove special case handling of stream versus stdio-stream. If the two types are not both structures, then check whether they are both cobj classes. If so, check if they are in an inheritance relationship via the cobj_hash. (cobj_populate_hash): Map each symbol to a fixnum integer which gives class handle'position in the cobj_class table. (cobj_class_exists): Style: compare to nil instead of 0. (obj_init): Do not call cobj_populate_hash here, it is far too early: only a couple of COBJ types exist at this point. Moreover, hash_init has not been called so hash_cls and hash_iter_cls still have null symbols. (init): Call obj_populate_hash here, as the last step. * tests/012/type.tl: New file.
* with-resources: undocumented nil skip behavior.Kaz Kylheku2021-07-071-0/+18
| | | | | | | | | | | | | | | Paul A. Patience discovered the hidden "feature" of with-resourcers, that the three-argument form of the binding (var init cleanup) causes the with-resources form to terminate if init returns nil. The (var init) syntax doesn't generate this logic. * stdlib/with-resources.tl (with-resources): Do not emit the when form unless <= 265 compatibility is in effect. * tests/012/oop-mac.tl: New file. * txr.1: Compat note added.
* genman, lib, tests: use defvarl where possible.Paul A. Patience2021-07-053-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * genman.txr (dupes, tagnum): Replace defvar with defvarl. * stdlib/doc-lookup.tl (os-symbol): Same. * tests/011/macros-3.tl (x): Same. * tests/011/mandel.txr (x-centre, y-centre, width, i-max, j-max, n) (r-max, pixel-size, x-offset, y-offset): Same. (file, colour-max): Delete (unused) variables. * tests/012/circ.tl (x): Replace defvar with defvarl. * tests/012/stack.tl (stack-limited): Same. * tests/012/struct.tl (s): Same. * tests/013/maze.tl (vi, pa, sc): Delete variables. Use function arguments instead. (usage): Fix typo. * tests/014/dgram-stream.tl (family): Rename to... (*family*): ...this. * tests/014/socket-basic.tl (socktype): Rename to... (*socktype*): ...this. (%iters%): Replace defvar with defvarl.
* stack-limit: bug: not handling RLIM_INFINITY.Kaz Kylheku2021-07-041-6/+9
| | | | | | | | | | | | | | | | * gc.c (gc_init): We must check rlim_cur for the RLIM_INFINITY value indicating unlimited stack, and not misuse this value as a limit number, otherwise hilarity ensues. This reproduced on an older platform with make 3.81, which calls setrlimit to bring about an unlimited stack, passed on to child processes. Because of this txr segfaulted, as a consequence of a false positive. * tests/012/stack.tl (stack-limited): New variable which indicates whether there is a stack limit. If there isn't, we avoid running the fork-based test case. Also, we set the stack limit to 32768 so we have a limit against which to run some of the tests.
* compiler: add failing inline lambda tests.Kaz Kylheku2021-07-031-0/+10
| | | | | * tests/012/lambda.tl: Add tests where apply list supplies : values to optional params, which must trigger defaulting.
* tests: support for compiled test forms.Kaz Kylheku2021-07-031-0/+5
| | | | | | | | * tests/common.tl (*compile-test*): New variable. (vtest): Compile cases via compile-toplevel if *compile-test* is true, catching compile-time exceptions. * tests/012/lambda.tl: Set *compile-test* true and repeat file.
* tests: include constp test in compile case.Kaz Kylheku2021-07-021-1/+1
| | | | * tests/012/compile.tl: Add const.tl file.
* tests: simplify file name handling in compile test.Kaz Kylheku2021-07-021-5/+5
| | | | | * tests/012/compile.tl: Remove suffixes from name list, and simplify code.
* lambda: tests.Kaz Kylheku2021-07-021-0/+88
| | | | * tests/012/lambda.tl: New file.
* constantp: tests.Kaz Kylheku2021-07-021-0/+23
| | | | * tests/012/const.tl: New file.
* op: bug in do: must insert @1 into unexpanded form.Kaz Kylheku2021-06-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case when the do syntax has no metavariables, and it expands as-is without the addition of symbol in the tail position, we are doing something wrong: we are adding the @1 into the expanded version of the form, rather than the original. For instance: 1> (expand '(do pop a)) (lambda (#:arg-1-0017 . #:arg-rest-0016) (prog1 (car a) (sys:setq a (cdr a)) #:arg-1-0017)) Here, the @1 was inserted into the (prog1 ...) form which is the expansion of pop. This is incorrect; it must be inserted into the original (pop a) syntax as (pop a @1). * op.tl (op-expand): In this case when there are no metas and no do-gen that can be replaced by @1 via symacrolet, go back to the original args syntax, add the arg1 meta into that syntax, and process it from the beginning through parallel expansions steps. * tests/012/op.tl: Couple of tests added.
* tests: reduce time spent in stack overflow test.Kaz Kylheku2021-06-261-0/+1
| | | | | | * tests/012/stack2.txr: This test case can prove its point in a much smaller stack limit than the one derived from the system default. Let's cut it to 32 kilobytes.
* signals: disable stack overflow in handler.Kaz Kylheku2021-06-241-0/+11
| | | | | | | | | | | * signal.c (sig_handler): For a is_cpu_exception signal, we temporarily disable the stack limit. It might be executing on the sigaltstack buffer, which is almost certainly below the stack limit. * tests/012/stack.tl: New test case. We raise a SIGSEGV and check that in the handler, the stack limit is disabled, and that we can executed code.
* txr: stack protection in pattern language.Kaz Kylheku2021-06-242-0/+9
| | | | | | | | | * txr.c (do_match_line, match_files): call gc_stack_check on entry. * tests/012/stack2.txr: New file. * tests/012/stack2.expected: New file.
* Test for stack overflow protection.Kaz Kylheku2021-06-241-0/+31
| | | | | | * tests/012/stack.tl: New file. * tets/common.tl (mvtest): New macro.
* lib: rmismatch tests and bugfix.Kaz Kylheku2021-06-221-0/+39
| | | | | | | * lib.c (rmismatch): when left is an empty string or vector, and right is nil: we must return -1 not zero. * tests/012/seq.tl: More rmismatch tests.
* lib: optimize mismatch, rmismatch for strings.Kaz Kylheku2021-06-221-0/+49
| | | | | | | | | | | * lib (mismatch, rmismatch): If the arguments are strings or literals, other than lazy strings, keyfun is identity, and equality is by character identity, the operation can be done with an efficient loop over the wchar_t strings. * tests/012/seq.tl: Tests for string case of mismatch, via starts-with function. Test mismatch via ends-with, and also directly for vectors and strings.
* Dubious new functions cxr/cyr.Kaz Kylheku2021-06-211-0/+14
| | | | | | | | | | | | | | * lib.c (cxr, cyr): New functions. * lib.h (cxr, cyr): Declared. * eval.c (eval_init): Intrinsics cxr and cyr registered. * tests/012/cadr.tl: New file. * txr.1: Documented. * share/txr/stdlib/doc-syms.tl: Updated.
* read/get-json: reject trailing junk in string input.Kaz Kylheku2021-06-201-0/+52
| | | | | | | | | | | | | | | | | * parser.c (lisp_parse_impl): If parsing from string, check for trailing junk and diagnose. JSON parsing doesn't use lookahead because it doesn't have a.b syntax, so the recent_tok gives the last token that actually went into the syntax, and not a lookahead token. So in the case of JSON, we call yylex to see if there is any trailing token. * tests/010/json.tl: Extend get-json tests to more kinds of objects, and then replicate with trailing whitespace and trailing junk to provide coverage for these cases. * tests/012/parse.t: Slew of new read tests and iread also. * txr.1: Documented.
* op: tests, and fix (op progn ...) situationKaz Kylheku2021-06-171-0/+71
| | | | | | | | | | * share/txr/stdlib/op.tl (op-expand): For the sake of special processing applied to support the lop operator, the code assumes that the expanded syntax-2 is a list with at least two elements, such that we can do (cddr syntax-2). This is not true for instance in (op progn). * tests/012/op.tl: New file.
* expander: bug: atoms in quasiliteral.Kaz Kylheku2021-06-151-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Via macros, atoms can sneak into a quasiliteral which then blow up because they get treated as strings without being converted. Example: (defmacro two () 2) `@(two)xyz` -> ;; error The expansion produces the invalid form, in which the 2 is subsequently treated as a string. (sys:quasi 2 "xyz") On the other hand, symbol macros don't have this problem: (defsymacro two 2) `@{two}xyz` -> "2xyz" The reason is that the (sys:var two) syntax will expand to (sys:var 2), and not 2. The straightforward, consistent fix is to ensure that the first case will also go to (sys:var 2). * eval.c (expand_quasi): If the expanded form is an atom which is not a bindable symbol, wrap it in a sys:var. * tests/012/quasi.tl: Test cases added. Also adding a compilation test for this file, cribbed from patmatch.tl.
* tests: remove *stderr* to *stdnull* redirection.Kaz Kylheku2021-06-111-2/+1
| | | | | | | | | | | | | | The recent commit 225ff2fa2fdb9e5169db5e2c06dc3b0053b775bb titled "errors: avoid premature release of deferred warnings." obviates the need for dealing with noise when detecting errors from test cases. * patmatch.tl: Remove macro-time-let around several test cases. * tests/012/ifa.tl: Likewise. * tests/common.tl (macro-time-let): Macro removed.
* reduce-left: rewrite using seq_iter.Kaz Kylheku2021-06-091-0/+10
| | | | | | | * lib.c (reduce_left): Use sequence iteration instead of list operations. * txr.1: Add a note to the documentation.
* expander: expand must only ignores unbound warnings.Kaz Kylheku2021-06-071-4/+4
| | | | | | | | | | | | | | | | | | | The expand function must not muffle all deferred warnings. That causes the problem that a form like (inc var a.bar) fails to produce a warning due to bar not being the slot of any structure. The expand function must only muffle warnings about undefined functions and variables. * eval.c (muffle_unbound_warning): New static function. (no_warn_expand): Use muffle_unbound_warning as handler, rather than uw_muffle_warning. * tests/012/struct.tl: Fix two test cases here which test the expand function using a form that references a nonexistent slot. These now generate a warning, so we use the slot name b rather than d, which is defined. * txr.1: Documented change to expand.
* tests: fix vtest being hindrance to error finding.Kaz Kylheku2021-05-251-1/+1
| | | | | | | | | | | | | * tests/common.tl (vtest): Only if the expected expression is :error or (quote :error) do we wrap the expansion and evaluation of the test expression with exception handling, because only then do we expect an error. When the test expression is anything else, we don't intercept any errors, and so problems in test cases are easier to debug now. * tests/012/struct.tl: In one case we must initialize the *gensym-counter* to 4 to compensate for the change in vtest to get the same gensym numbers in the output.
* window-map: add tests, improve doc, add examples.Kaz Kylheku2021-05-251-0/+19
| | | | | | | * tests/012/seq.tl: New tests. * txr.1: Improve documentation of window-map's :wrap and :reflect. Add examples.
* window-map: broken :wrap and :reflect.Kaz Kylheku2021-05-251-0/+33
| | | | | | | | | | | * lib.c (window_map_list): Rewrite :wrap and :reflect support. The main issue with these is that they only sample items from the front of the input list and generate both flanks of the boundary from that prefix; :reflect is additionaly buggy due to applying nreverse to a sub which can return the original sequence. * tests/012/seq.tl: Some test coverage for window-map.
* parser: bug: handing of lex state in pushback tokens.Kaz Kylheku2021-05-121-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is fairly obscure. A repro test case is a file which contains: 3"foo" When the 3 is parsed, the " is also scanned as a lookahead token, and when that happens, the lexer shifts into the STRLIT state. At that point the parse job finishes for that top-level form. The next time the parser is called, it will prime the token stream by pushing the " token into it. But, the lex state is not put into the STRLIT. State. The result is that the parser obtains the " token, and then foo is lexically analyzed in the wrong state as a symbol. A syntax error occurs: symbol token in the middle of a string literal, instead of just a sequence of LITCHAR tokens, as expected. What we can do is associate a lex state with pushback tokens. If a pushback token has a nonzero lex state which is different from the current YYSTATE, then when that pushback token is consumed, we push that state also. * parser.h (struct yy_token): New member, yy_lex_state. * parser.c (parser_common_init): Initialize the new yy_lex_state member of every token member of the parser structure. * parser.l (yylex): When feeding a pushed token to the parser, if that token has a nonzero state, and the state is different from YYSTATE, we push that state. So for instance a pushed back " token will carry the STRLIT state, which is different from the NESTED state that will be in effect at the start of the parse job, and so it will be pushed, as if the " character had been scanned. Also, when we call the real yylex_impl, when we are storing the recenty seen token in recent_tok, we also store the current YYSTATE along with it. That's how tokens get associated with a state. The artificial tokens that are used for priming parsing like SECRET_ESCAPE_E are never associated with a nonzero state. * tests/012/syntax.tl: Some test cases that didn't pass before this. * lex.yy.c.shipped: Regenerated.
* parser: #; tests and bugfixes.Kaz Kylheku2021-05-061-0/+20
| | | | | | | | | | | | | | | | This is motivated by the recent crash regression in the #; comment out mechanism. The parser doesn't have adequate coverage in the test suite. * tests/012/syntax.tl: New file, for testing syntax. A problem was found #;.expr did not work inside a list, only at top level. It required a space before the dot. * parser.y (listacc): A couple of productions to handle hash-semicolon immediately followed by a dot without any whitespace, and then by an expression. * y.tab.c.shipped: Regenerated.
* buf: bugfix: int-buf, uint-buf refer to alloc size.Kaz Kylheku2021-05-041-0/+4
| | | | | | | | | | * buf.c (int_buf, uint_buf): Refer to the buffer length b->len rather than the underlying allocation size b->size. Referring to b->size will not only produce the wrong value when it is larger than len, but b->size can be null for a borrowed buffer, producing a crash. * tests/012/buf.tl: Tests.
* mapcar*: fix broken.Kaz Kylheku2021-04-291-0/+6
| | | | | | | | | | * eval.c (lazy_mapcar_func): We must capture the return value of iter_step, since we refer to it in the next statement, expecting it to have stepped. This bug causes a behavior as if the original list had an extra nil. * tests/012/lazy.tl: Tests. Poor test coverage is why this sort of thing comes up and bites us.
* compile/eval: new operator, mac-env-param-bind.Kaz Kylheku2021-04-211-0/+5
| | | | | | | | | | | | | | | | | | | | | | mac-env-param-bind is like mac-param-bind but also allows the value for the :env parameter to be specified. * eval.c (op_mac_env_param_bind_s): New sy mbol variable. (op_mac_env_param_bind): New static function. (do_expand): Handle mac_env_param_bind_s. (eval_init): Initialize symbol variable and register macro. * share/txr/stdlib/compiler.tl (compiler compile): Add case for mac-env-param-bind. (compiler comp-mac-env-param-bind): New method. * share/txr/stdlib/doc-syms.tl: Updated with new hashes for tree-bind and mac-param-bind, and inclusion of mac-env-param-bind. * tests/012/binding.tl: New file. * txr.1: Documented.
* tests: disable some UTF-8 tests on 16 bit wchar_t.Kaz Kylheku2021-04-201-8/+9
| | | | | * tests/012/parse.tl: All the tests in this file blow up on systems that don't have a full-blown character type.
* compile/eval: print compiler error on *stderr*.Kaz Kylheku2021-04-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | * share/txr/stdlib/error.tl (compile-error): Print the error message on *stderr*, like we do with warnings. This allows the programming environment to pick up the error message and navigate to that line accordingly. The error message is also output by the unhandled exception logic but with a prefix that prevents parsing by the tooling. To avoid sending double error messages to the interactive user, we only issue the *stderr* message if *load-recursive* is true. * tests/common.tl (macro-time-let): New macro. This lets us bind special variables around the macro-expansion of the body, which is useful when expansion-time logic reacts to values of special variables. * tests/012/ifa.tl: Use macro-time-let to suppress *stderr* around the expansion of the erroneous ifa form. We now needs this because the error situation spits out a message on *stderr*, in addition to throwing.
* tests: use fixed regsub in compile test.Kaz Kylheku2021-04-131-1/+1
| | | | * tests/012/compile.tl: Simplify code with regsub.
* tests: implicitly generate empty .expected files.Kaz Kylheku2021-04-1213-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Makefile (%.expected): New implicit rule. Whenever a test requires a .expected file, if it is missing, we create an empty one. This file will be treated as an intermediate by GNU Make, which means that it will be deleted when make terminates. * tests/012/compile.tl: Some of the .tl files no longer have an .expected file, so we have to test for that in the catenating logic. * tests/008/call-2.expected, * tests/008/no-stdin-hang.expected, * tests/011/macros-3.expected, * tests/011/patmatch.expected, * tests/012/aseq.expected, * tests/012/ashwin.expected, * tests/012/compile.tl, * tests/012/cont.expected, * tests/012/defset.expected, * tests/012/ifa.expected, * tests/012/oop-seq.expected, * tests/012/parse.expected, * tests/012/quasi.expected, * tests/012/quine.expected, * tests/012/seq.expected, * tests/012/struct.expected, * tests/012/stslot.expected, * tests/014/dgram-stream.expected, * tests/014/in6addr-str.expected, * tests/014/inaddr-str.expected, * tests/014/socket-basic.expected, * tests/015/awk-fconv.expected, * tests/015/split.expected, * tests/015/trim.expected, * tests/016/arith.expected, * tests/016/ud-arith.expected, * tests/017/ffi-misc.expected, * tests/018/chmod.expected: Empty file deleted.
* compiler: new test case.Kaz Kylheku2021-04-111-0/+12
| | | | | | | | * tests/012/compile.tl (new-file): Compiles a select set of .tl files in the same directory. The compile.expected file is dynamically created from catenating the .expected files corresponding to those .tl files; the output is expected to be the same from compiling those files as from interpreting them.
* parser: allow non-UTF-8 bytes in literals and regexes.Kaz Kylheku2021-04-081-0/+6
| | | | | | | | | | * parser.l (grammar): Just like we do in SREGEX, allow an arbitrary byte in REGEX, mapping it to the DCxx range. Do the same inside string literals of all types. * lex.yy.c.shipped: Updated. * tests/012/parse.tl: New tests.
* parser: allow funny UTF-8 in regexes and literals.Kaz Kylheku2021-04-082-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main idea in this commit is to change a behavior of the lexer, and take advantage of it in the parser. Currently, the lexer recognizes a {UANYN} pattern in two places. That pattern matches a UTF-8 character. The lexeme is passed to the decoder, which is expected to produce exactly one wide character. If the UTF-8 is bad (for instance, a code in the surrogate pair range U+DCxx) then the decoder will produce multiple characters. In that case, these rules return ERRTOK instead of a LITCHAR or REGCHAR. The idea is: why don't we just return those characters as a TEXT token? Then we can just incorporate that into the literal or regex. * parser.l (grammar): If a UANYN lexeme decodes to multiple characters instead of the expected one, then produce a TEXT token instead of complaining about invalid UTF-8 bytes. * parser.y (regterm): Recognize a TEXT item as a regterm, converting its string value to a compound node in the regex AST, so it will be correctly treated as a fixed pattern. (chrlit): If a hash-backslash is followed by a TEXT token, which can happen now, that is invalid; we diagnose that as invalid UTF-8. (quasi_item): Remove TEXT rule, because the litchars constituent not generates TEXT. (litchars, restlistchar): Recognize TEXT item, similarly to regterm. * tests/012/parse.tl: New file. * tests/012/parse.expected: Likewise.