txr - TXR: A data munging language.

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	match: binary-integer conv tests for #x-8000...	Kaz Kylheku	2021-05-21	1	-0/+21
\| \| \| \| \| \|	* tests/016/arith.tl: Test providing coverage for the most negative two's complement integer, #x-800...00 in various sizes. The 64 bit cases are failing.
*	math: add some tests related to integer conversion.	Kaz Kylheku	2021-05-21	1	-0/+50
\| \| \| \| \| \|	* tests/016/arith.tl: Add tests covering the fixnum/bignum knee, and ffi operations of various sizes that provide coverage of various conversion routines.
*	parser: bug: handing of lex state in pushback tokens.	Kaz Kylheku	2021-05-12	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is fairly obscure. A repro test case is a file which contains: 3"foo" When the 3 is parsed, the " is also scanned as a lookahead token, and when that happens, the lexer shifts into the STRLIT state. At that point the parse job finishes for that top-level form. The next time the parser is called, it will prime the token stream by pushing the " token into it. But, the lex state is not put into the STRLIT. State. The result is that the parser obtains the " token, and then foo is lexically analyzed in the wrong state as a symbol. A syntax error occurs: symbol token in the middle of a string literal, instead of just a sequence of LITCHAR tokens, as expected. What we can do is associate a lex state with pushback tokens. If a pushback token has a nonzero lex state which is different from the current YYSTATE, then when that pushback token is consumed, we push that state also. * parser.h (struct yy_token): New member, yy_lex_state. * parser.c (parser_common_init): Initialize the new yy_lex_state member of every token member of the parser structure. * parser.l (yylex): When feeding a pushed token to the parser, if that token has a nonzero state, and the state is different from YYSTATE, we push that state. So for instance a pushed back " token will carry the STRLIT state, which is different from the NESTED state that will be in effect at the start of the parse job, and so it will be pushed, as if the " character had been scanned. Also, when we call the real yylex_impl, when we are storing the recenty seen token in recent_tok, we also store the current YYSTATE along with it. That's how tokens get associated with a state. The artificial tokens that are used for priming parsing like SECRET_ESCAPE_E are never associated with a nonzero state. * tests/012/syntax.tl: Some test cases that didn't pass before this. * lex.yy.c.shipped: Regenerated.
*	tree: let tree-iter be iterable via generic iteration.	Kaz Kylheku	2021-05-12	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	* lib.c (seq_iter_init_with_info): Recognize tree_iter object, and treat using tree iterator function. * tests/010/tree.tl: test case for tree subrange iteration with collect-each. * txr.1: Updated.
*	tree: streamline iteration; provide high limit.	Kaz Kylheku	2021-05-11	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Getting rid of tree-begin-at and tree-reset-at. Now tree-begin takes two optional parameters, for specifying high and low range. * tree.c (struct tree_diter): New members, tree and highkey. We need tree due to requiring access to the less function. If the iterator has no highkey, the iterator itself is stored in that member to indicate this. (tree_iter_mark): Mark the tree and highkey. (tree_begin): Take optional lowkey and highkey arguments, initializing iterator acordingly. (tree_begin_at): Function removed. (copy_tree_iter, replace_tree_iter): Copy tree and highkey members. The latter require special handling due to the funny convention for indicating highkey absence. (tree_reset): Take optional lowkey and highkey arguments, configuring these in the iterator being reset. (tree_reset_at): Function removed. (tree_next, tree_peek): Implement highkey semantics. (sub_tree): Simplified: from and to arguments are just passed through to tree_begin, and there is no need for a separate loop which enforces the upper limit, that now being handled by the iterator itself. (tree_begin): Update registrations of tree-begin and tree-reset; remove tree-begin-at and tree-reset-at intrinsics. * tree.h (tree_begin_at, tree_reset_at): Declarations removed. (tree_begin, tree_reset): Declarations updated. * lib.c (seq_iter_rewind, seq_iter_init_with_info, where, populate_obj_hash): Default new optional arguments in tree_begin and tree_reset calls. * parser.c (circ_backpatch): Likewise. * tests/010/tree.tl: Affected cases updated. * txr.1: Documentation updated. * share/txr/stdlib/doc-syms.tl: Regenerated.
*	tree: support indexing and range extraction.	Kaz Kylheku	2021-05-11	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* lib.c (do_generic_funcall): Support tree object invocation with one or two arguments via sub and ref. (sub): Implement for trees via sub_tree. (ref): Implement for trees via tree_lookup. * tree.c (sub_tree): New function. (tree_init): Register sub-tree intrinsic. * tree.h (sub_tree): Declared. * tests/010/tree.tl: New tests. * txr.1: Documented: DWIM bracket syntax on trees, sub and ref support for trees, sub-tree function, * share/txr/stdlib/doc-syms.tl: Regenerated.
*	tree: replace-tree-iter function.	Kaz Kylheku	2021-05-11	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	* tree.c (replace_tree_iter): New function. (tree_init): Register replace-tree-iter intrinsic. * tree.h (tree_init): Declared. * share/txr/stdlib/doc-syms.tl: Updated. * txr.1: Documented. * tests/010/tree.tl: New test case.
*	tree: copy-tree-iter function.	Kaz Kylheku	2021-05-10	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* lib.c (copy): Handle tree_iter_s via copy_tree_iter. * tree.c (copy_tree_iter): New function. (tree_init): copy-tree-iter intrinsic registered. * tree.h (copy_tree_iter): Declared. * tests/010/tree.tl: New test case. * txr.1: Documented. * share/txr/stdlib/doc-syms.tl: Updated.
*	diff/isec: reset hash/tree iter instead making new.	Kaz Kylheku	2021-05-10	2	-0/+11
\| \| \| \| \| \| \| \| \| \| \|	* lib.c (seq_iter_rewind): Use hash_reset and tree_reset to rewind the existing iterator rather than allocating a new one. * tests/010/hash.tl: New file, covering uni, diff and isec for hash tables. * tests/010/tree.tl: New tests.
*	tree: new tree-peek function.	Kaz Kylheku	2021-05-09	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* tree.c (tn_peek_next): New static function. (tree_peek): New function. (tree_init): Register tree-peek intrinsic. * tree.h (tree_peek): Declared. * txr.1: Documented. * tests/010/tree.c: Work tree-peek into existing test case. * share/txr/stdlib/doc-syms.tl: Updated.
*	tree: new make_similar_tree unction.	Kaz Kylheku	2021-05-09	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \|	* tree.c (make_similar_tree): New function. (tree_init): Register make-similar-tree intrinsic * tree.h (make_similar_tree): Declared. * tests/010/tree.tl: New tests. * txr.1: Documented.
*	parser: #; tests and bugfixes.	Kaz Kylheku	2021-05-06	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is motivated by the recent crash regression in the #; comment out mechanism. The parser doesn't have adequate coverage in the test suite. * tests/012/syntax.tl: New file, for testing syntax. A problem was found #;.expr did not work inside a list, only at top level. It required a space before the dot. * parser.y (listacc): A couple of productions to handle hash-semicolon immediately followed by a dot without any whitespace, and then by an expression. * y.tab.c.shipped: Regenerated.
*	matcher: new "each-match family" of macros.	Kaz Kylheku	2021-05-04	1	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* lisplib.c (match_set_entries): New autoload symbols: each-match, append-matches, keep-matches, each-match-product, append-match-products, keep-match-products. * share/txr/stdlib/doc-syms.tl: Updated. * share/txr/stdlib/match.tl (each-match-expander): New function. (each-match, append-matches, keep-matches, each-match-product, append-match-products, keep-match-products): New macros. * tests/011/patmatch.tl: New tests covering each macro, far from exhaustively. * txr.1: Documented.
*	format: ~x/~X specifiers support buffers.	Kaz Kylheku	2021-05-04	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \|	* buf.c (buf_hex): New function. * buf.h (buf_hex): Declared. * stream.c (formatv): Support printing of buffers in hex via temporary buffer containing hex characters, similarly to how bignums are handled. * tests/018/format.tl: New file, providing some coverage over new and affected code.
*	buf: bugfix: int-buf, uint-buf refer to alloc size.	Kaz Kylheku	2021-05-04	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	* buf.c (int_buf, uint_buf): Refer to the buffer length b->len rather than the underlying allocation size b->size. Referring to b->size will not only produce the wrong value when it is larger than len, but b->size can be null for a borrowed buffer, producing a crash. * tests/012/buf.tl: Tests.
*	rel-path: bugfixes.	Kaz Kylheku	2021-05-03	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* share/txr/stdlib/copy-file.tl: When removing .. components, a dotdot must only cancel preceding non-dotdot. We must check not only that the out stack is not empty but that the top element isn't dotdot. Also, eliminate empty components, like the documentation says. Lastly, we must check for the impossible cases, when the from path uses .. components that are impossible to navigate backwards to form a relative path. * tests/018/rel-path.tl: Test cases added. * txr.1: Updated with additional descriptions, fixes and examples.
*	New function: rel-path.	Kaz Kylheku	2021-05-03	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \|	* lisplib.c (copy_file_set_entries): Add rel-path as autoload trigger for copy-file module. * share/txr/stdlib/copy-file.tl (rel-path): New function. * tests/018/rel-path.tl: New file. * txr.1: Documented. * share/txr/stdlib/doc-syms.tl: Updated.
*	bug: join-with segfault on character separators.	Kaz Kylheku	2021-05-02	1	-0/+23
\| \| \| \| \| \| \|	* lib.c (join_with): Pass the correct onech array down to cat_str_init, rather than a null pointer. * tests/015/split.tl: New tests covering join and join-with.
*	tree: new functions for reseting iterator.	Kaz Kylheku	2021-04-30	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* tree.c (tree_reset, tree_reset_at): New functions. (tree_init): tree-reset and tree-reset-at intrinsics registered. * tree.h (tree_reset, tree_reset_at): Declared. * tests/010/tree.tl: New tests. * txr.1: Documented. * share/txr/stdlib/doc-syms.tl: Updated.
*	tree: use rlist in test case.	Kaz Kylheku	2021-04-30	1	-1/+1
\| \| \| \| \|	* tests/010/tree.tl: Use rlist to express discontinuous range instead of appending ranges.
*	mapcar*: fix broken.	Kaz Kylheku	2021-04-29	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	* eval.c (lazy_mapcar_func): We must capture the return value of iter_step, since we refer to it in the next statement, expecting it to have stepped. This bug causes a behavior as if the original list had an extra nil. * tests/012/lazy.tl: Tests. Poor test coverage is why this sort of thing comes up and bites us.
*	tree: new tree-begin-at function.	Kaz Kylheku	2021-04-29	1	-1/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* tree.c (enum tree_iter_state): New iterator state tr_find_low_prepared dedicated to the tree-begin-at traversal. This state indicates that tree-next should visit the starting node that it is given, and then after that, treat anything to the left of it as having been visited. In the other states, tree-next does not visit the node it is given but uses it as the starting point to find the next node. (tn_find_next): Bugfix here: when navigating the right link, the function neglected to add the node to the path. But the logic for backtracking up the path expects this: it checks whether the node from the path is the parent of a right child. Somehow this didn't cause a problem for full traversals with tree-begin; at least the existing test cases don't expose an issue. It caused a problem for tree-begin-at, though. (tn_find_low): New static function. This finds the low-key node in the tree, priming the iterator object with the correct state and path content to continue the traversal from that node on . We need the tr_find_low_prepared state in the iterator in order to visit the low node itself that was found. (tree_begin_at): New function. (tree_init): Register tree-begin-at intrinsic. * tree.h (tree_begin_at): Declared. * tests/010/tree.tl: New test cases for tree-begin-at. * txr.1: Documented. * share/txr/stdlib/doc-syms.tl: Updated.
*	tree: more tests.	Kaz Kylheku	2021-04-29	1	-0/+40
\| \| \| \| \| \|	* tests/010/tree.tl: New tests, broadening coverage. * share/txr/stdlib/doc-syms.tl: Regenerated.
*	tree: incorrect lookup function.	Kaz Kylheku	2021-04-29	1	-0/+31
\| \| \| \| \| \| \|	* tree.c (tn_lookup): The right case is incorrectly chasing the left pointer. * tests/010/tree.tl: New file.
*	tree: debug massive gc problems.	Kaz Kylheku	2021-04-29	1	-0/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The tree module doesn't observe generational GC correctness; it assigns objects into other objects using ordinary assignment. * tests/010/tree.tl (tree_iter): New member, tree. This is initialized to null for iterators on the stack. dynamic iterator, we need this to be a back-pointer to the dynamic iterator. (tree_iter_init): Add parameter to initializer to set up the back-pointer. (set_left, set_right, set_key): Use set macro instead of ordinary assignment. (tn_find_next): Use set macro to add node to path. (tn_flatten, tn_build_tree): Use set macro. (tr_rebuild, tr_rebuild_scapegoat, tr_insert, tr_do_delete), tr_delete): Use set macro. Take a tree argument so we can use set macro on tr->root. (tree_insert): Use set macro. Pass 0 to tree_iter_init initializer macro. (tree_delete_node): Pass tree to tr_delete. (tree_equal_op, tree_print_op, tree_hash_op): Pass 0 to tree_iter_init initializer macro. (tree-begin): Rearrange construction for GC correctness: avoid storing pointers into not-yet-reachable structure.
*	match-str: tests with negative pos.	Kaz Kylheku	2021-04-28	1	-1/+29
\| \| \| \|	* tests/015/match-str.tl: Tests added.
*	match-str: tests and bugfix.	Kaz Kylheku	2021-04-27	1	-0/+41
\| \| \| \| \| \|	* lib.c (do_match_str): Fix wrong return value calculation in LSTR-LSTR case. * tests/015/match-str.tl: New file.
*	matcher: make use of mtest in test suite.	Kaz Kylheku	2021-04-27	1	-111/+122
\| \| \| \| \|	* tests/011/patmatch.tl: Use mtest throughout to condense the syntax.
*	matcher: add some test variants.	Kaz Kylheku	2021-04-26	1	-2/+20
\| \| \| \| \| \| \|	* tests/011/patmatch.tl: Add variants based on existing tests which insert an extra character at the left that is matched by a bound variable. This tests that the remainder of the pattern is following the offset numeric position within the string.
*	matcher: quasi match incorrectly treats nil as bound.	Kaz Kylheku	2021-04-26	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \|	* share/txr/stdlib/match.tl (expand-quasi-match): bound-p local function must return nil if the symbol is nil. * share/txr/stdlib/match.tl: New test cases testing that @nil is treated as an unbound variable in the non-consecutive-variables test. Also, making duplicates of certain tests that start with a text match and sticking @nil as the first element into them, so that the text match is forced to be the second item.
*	matcher: bugfix in `text{rest}` case.	Kaz Kylheku	2021-04-26	1	-0/+3
\| \| \| \| \| \|	* share/txr/stdlib/match.tl (expand-quasi-match): Calculate npos correctly relative to current pos. Use match-str rather than starts-with.
*	matcher: more quasi tests: coverage of all cases.	Kaz Kylheku	2021-04-25	1	-1/+8
\| \| \| \| \| \|	* tests/011/patmatch.tl: More tests. All explicitly coded cases covered, except the fall-through situations we are not yet catching in expand-quasi-match.
*	match: third round of quasi tests and fixes.	Kaz Kylheku	2021-04-25	1	-0/+14
\| \| \| \| \| \| \|	* share/txr/stdlib/match.tl (expand-quasi-match): Add case fo r unbound var followed by var, followed by nothing. * tests/011/patmatch.tl: New tests.
*	matcher: second round of quasi tests and fixes.	Kaz Kylheku	2021-04-25	1	-0/+6
\| \| \| \| \| \| \|	* share/txr/stdlib/match.tl (expan-quasi-match): Use rest variable consistently instead of (cdr args). Two instances of (cdr rest) should just be rest. New case added for variable with no modifiers followed by text being the last item.
*	matcher: first round of quasi tests and bugfix.	Kaz Kylheku	2021-04-25	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	* share/txr/stdlib/match.tl (expand-quasi-match): The return value of search-str isn't a length but an absolute position. We not only fix a bug, but lose a useless calculation. * tests/011/patmatch.tl: New test cases for quasiliteral patterns, starting with the most rudimentary. Last one broke, due to the above issue.
*	matcher: compile the test cases.	Kaz Kylheku	2021-04-22	1	-6/+13
\| \| \| \| \| \| \| \|	* tests/011/patmatch.tl: Wrap one test with compile-only and eval-only so that the compiler ignores it. Add a form at the end of the file, similarly ignored by the compiler to compile the file. This compiles and executes all the test cases.
*	matcher: defmatch: useful :env parameter.	Kaz Kylheku	2021-04-21	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* share/txr/stdlib/match.tl (compile-match): Pattern macro expanders now have an environment parameter. We turn the list of variables that have been bound so far into a fake macro-time lexical environment, the parent of which is the surrounding environment. The pattern macro can query this using the lexical-var-p function to determine whether a given variable already has a binding, either in the pattern, or in the surrounding lexical environment. (defmatch): Generate a two-argument lambda, and use the new mac-env-param-bind to make the environment object available to the user-defined expansion. * tests/011/patmatch.tl: New test cases for this environment mechanism, and also for defmatch itself. * txr.1: Document role of :env under defmatch.
*	compile/eval: new operator, mac-env-param-bind.	Kaz Kylheku	2021-04-21	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mac-env-param-bind is like mac-param-bind but also allows the value for the :env parameter to be specified. * eval.c (op_mac_env_param_bind_s): New sy mbol variable. (op_mac_env_param_bind): New static function. (do_expand): Handle mac_env_param_bind_s. (eval_init): Initialize symbol variable and register macro. * share/txr/stdlib/compiler.tl (compiler compile): Add case for mac-env-param-bind. (compiler comp-mac-env-param-bind): New method. * share/txr/stdlib/doc-syms.tl: Updated with new hashes for tree-bind and mac-param-bind, and inclusion of mac-env-param-bind. * tests/012/binding.tl: New file. * txr.1: Documented.
*	matcher: new pattern operator @(end)	Kaz Kylheku	2021-04-20	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* share/txr/stdlib/doc-syms.tl: New entry for end. * share/txr/stdlib/match.tl (check, check-end, check-sym, loosen, pat-len): New functions, taken from original local functions of sme macro. (sme): Refactored by hoisting local functions out. Some local variable renaming. (end): New pattern macro. * tests/011/patmatch.tl: New test for end. * txr.1: Documented.
*	tests: disable some UTF-8 tests on 16 bit wchar_t.	Kaz Kylheku	2021-04-20	1	-8/+9
\| \| \| \| \|	* tests/012/parse.tl: All the tests in this file blow up on systems that don't have a full-blown character type.
*	openbsd: fix tests.	Kaz Kylheku	2021-04-20	5	-32/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* tests/014/socket-basic.tl (%iters%): Also reduce to 2000 on OpenBSD, to avoid the default limit on UDP datagram size. * tests/017/glob-carray.tl: Use the BSD-style struct glob-t on OpenBSD also. * tests/017/glob-zarray.tl: Likewise. * tests/018/chmod.tl (os): New global variable. (test-sticky): s-isvtx not allowed for non-root user on OpenBSD, so we falsify this variable. * tests/common.tl (os-symbol): Add OpenBSD case, producing :openbsd keyword symbol. (libc): Let's just use (dlopen nil) for any platform that isn't Cygwin or Cygnal.
*	matcher: first pattern macro, sme.	Kaz Kylheku	2021-04-19	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \|	* lisplib.c (match_instantiate): Intern sme symbol. * share/txr/stdlib/doc-syms.tl: Update with sme entry. * share/txr/stdlib/match.tl (sme): New defmatch macro. * tests/011/patmatch.tl: New tests for sme. * txr.1: Documented.
*	compile/eval: print compiler error on stderr.	Kaz Kylheku	2021-04-19	2	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* share/txr/stdlib/error.tl (compile-error): Print the error message on stderr, like we do with warnings. This allows the programming environment to pick up the error message and navigate to that line accordingly. The error message is also output by the unhandled exception logic but with a prefix that prevents parsing by the tooling. To avoid sending double error messages to the interactive user, we only issue the stderr message if load-recursive is true. * tests/common.tl (macro-time-let): New macro. This lets us bind special variables around the macro-expansion of the body, which is useful when expansion-time logic reacts to values of special variables. * tests/012/ifa.tl: Use macro-time-let to suppress stderr around the expansion of the erroneous ifa form. We now needs this because the error situation spits out a message on stderr, in addition to throwing.
*	tests: use fixed regsub in compile test.	Kaz Kylheku	2021-04-13	1	-1/+1
\| \| \| \|	* tests/012/compile.tl: Simplify code with regsub.
*	tests: implicitly generate empty .expected files.	Kaz Kylheku	2021-04-12	28	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Makefile (%.expected): New implicit rule. Whenever a test requires a .expected file, if it is missing, we create an empty one. This file will be treated as an intermediate by GNU Make, which means that it will be deleted when make terminates. * tests/012/compile.tl: Some of the .tl files no longer have an .expected file, so we have to test for that in the catenating logic. * tests/008/call-2.expected, * tests/008/no-stdin-hang.expected, * tests/011/macros-3.expected, * tests/011/patmatch.expected, * tests/012/aseq.expected, * tests/012/ashwin.expected, * tests/012/compile.tl, * tests/012/cont.expected, * tests/012/defset.expected, * tests/012/ifa.expected, * tests/012/oop-seq.expected, * tests/012/parse.expected, * tests/012/quasi.expected, * tests/012/quine.expected, * tests/012/seq.expected, * tests/012/struct.expected, * tests/012/stslot.expected, * tests/014/dgram-stream.expected, * tests/014/in6addr-str.expected, * tests/014/inaddr-str.expected, * tests/014/socket-basic.expected, * tests/015/awk-fconv.expected, * tests/015/split.expected, * tests/015/trim.expected, * tests/016/arith.expected, * tests/016/ud-arith.expected, * tests/017/ffi-misc.expected, * tests/018/chmod.expected: Empty file deleted.
*	compiler: new test case.	Kaz Kylheku	2021-04-11	1	-0/+12
\| \| \| \| \| \| \| \|	* tests/012/compile.tl (new-file): Compiles a select set of .tl files in the same directory. The compile.expected file is dynamically created from catenating the .expected files corresponding to those .tl files; the output is expected to be the same from compiling those files as from interpreting them.
*	parser: allow non-UTF-8 bytes in literals and regexes.	Kaz Kylheku	2021-04-08	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	* parser.l (grammar): Just like we do in SREGEX, allow an arbitrary byte in REGEX, mapping it to the DCxx range. Do the same inside string literals of all types. * lex.yy.c.shipped: Updated. * tests/012/parse.tl: New tests.
*	parser: allow funny UTF-8 in regexes and literals.	Kaz Kylheku	2021-04-08	2	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The main idea in this commit is to change a behavior of the lexer, and take advantage of it in the parser. Currently, the lexer recognizes a {UANYN} pattern in two places. That pattern matches a UTF-8 character. The lexeme is passed to the decoder, which is expected to produce exactly one wide character. If the UTF-8 is bad (for instance, a code in the surrogate pair range U+DCxx) then the decoder will produce multiple characters. In that case, these rules return ERRTOK instead of a LITCHAR or REGCHAR. The idea is: why don't we just return those characters as a TEXT token? Then we can just incorporate that into the literal or regex. * parser.l (grammar): If a UANYN lexeme decodes to multiple characters instead of the expected one, then produce a TEXT token instead of complaining about invalid UTF-8 bytes. * parser.y (regterm): Recognize a TEXT item as a regterm, converting its string value to a compound node in the regex AST, so it will be correctly treated as a fixed pattern. (chrlit): If a hash-backslash is followed by a TEXT token, which can happen now, that is invalid; we diagnose that as invalid UTF-8. (quasi_item): Remove TEXT rule, because the litchars constituent not generates TEXT. (litchars, restlistchar): Recognize TEXT item, similarly to regterm. * tests/012/parse.tl: New file. * tests/012/parse.expected: Likewise.
*	utf8: fix backtracking bugs in buffer decoder.	Kaz Kylheku	2021-04-07	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* utf8.c (utf8_from_buffer): Fix incorrect backtracking logic for handling bad UTF-8 bytes. Firstly, we are not backtracking to the correct byte. Because src is incremented at the top of the loop, the backtrack pointer must be set to src - 1 to point to the possibly bad byte. Secondly, when we backtrack, we are neglecting to rewinding nbytes! Thus after backtracking, we will not scan the entire input. Let's avoid using nbytes, and guard the loop based on whether we hit the end of the buffer; then we don't have any nbytes state to backtrack. * tests/017/ffi-misc.tl: New test case converting a three-byte UTF-8 encoding of U+DC01: an invalid character in the surrogate range. We test that the buffer decoder turns this into three characters, exactly like the stream decoder. Another test case for invalid bytes following a valid sequence start.
*	MacOS: adjust socket-basic test for dgram size.	Kaz Kylheku	2021-03-24	1	-4/+7
\| \| \| \| \| \| \| \| \|	* tests/014/socket-basic.tl (%iters%): New variable. 2000 on MacOS, 5000 elsewhere. (client, server): Use %iters% instead of hard-coded 5000. (test): Rename to sock-test, since it clashes with the test macro coming from common.tl, which we neeed to for the os-symbol function.