txr - TXR: A data munging language.

	Commit message (Collapse)	Author	Age	Files	Lines
*	read-until-match: fix regression.	Kaz Kylheku	2024-09-14	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 9aa751c8a4f845ef2d2bba091c81ffeded941afd broke things. This fix affects the function read-until-match, scan-until-match and count-until-match which share implementation. * regex.c (scan_until_common): In the REGM_MATCH_DONE and REGM_MATCH cases, we must push the character onto the local stack, before doing the match = stack assignment. Otherwise, it's possible that the stack is empty and so no match is recorded. The REGM_FAIL case will then behave as if no match was found, consuming a character and continuing. * txr.1: Codify an existing behavior: only non-empty matches for the regex are considered by read-until-match. * tests/015/regex.tl: New file. I am amazed to discover that we don't seem to have a test suite for regexes at all. Putting the tests here which confirm this fix and provide coverage for some edge cases in read-until-match.
*	copy-iter: test that the combi iterators copy.	Kaz Kylheku	2024-06-26	1	-0/+12
\| \| \| \|	* tests/015/comb.tl: New tests.
*	combi: fix permi and rpermi; impl combi, rcombi; test.	Kaz Kylheku	2024-06-24	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* combi.c (permi_get, permi_peek): Fix algorithm. (permi_mark): New static function. (permi_ops): Reference permi_mark for mark operation. (permi): Initialize it->ul.next to nao as required by new get/peek algorithm. (rpermi_get, rpermi_peek): Fix algorithm. (rpermi_mark): New static function. (rpermi_ops): Reference permi_mark for mark operation. (rpermi): Initialize it->ul.next to nao as required by new get/peek algorithm. (combi_get, combi_peek, combi_mark, combi_clone): New static functions. (combi_ops): New static structure. (combi): New function. (rcombi_get, rcombi_peek, rcombi_mark, rcombi_clone): New static functions. (rcombi_ops): New static structure. (rcombi): New function. * combi.h (combi, rcombi): Declared. * tests/015/comb.tl: New tests.
*	combi: fix broken k 0 edge cases for sequences.	Kaz Kylheku	2024-06-20	1	-0/+10
\| \| \| \| \| \| \| \| \|	* combi.c (rperm, comb, rcomb): In the default case for generic sequences, check k, like in the other cases and return the special case result. * tests/015/comb.tl: New tests.
*	perm, rperm, comb, rcomb: test generic sequences, bugfixes.	Kaz Kylheku	2023-12-27	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	perm doesn't generate items of the right type. We need to add the original sequence to the state vector and use make_like. The new generic sequence support in rperm is broken, too. * combi.c (perm_while_fun, perm_gen_fun_common): Rename p variable to vec. (perm_init_common): Rename to perm_init. Take one more argument and store in new fourth element of state vector. (perm_vec, perm_list, perm_str): Pass nil to new parameter of perm_init. (perm_seq_gen_fun): Use perm_list_gen_fun to get list permutations, and coerce each one to the same type as the sequence with make_like. (rcomb_seq_gen_fun): Remove redundant call to rcomb_gen_fun_common. The rcomb_list_gen_fun function is called, which does this already, so we lose every other sequence element. * tests/015/comb.tl: New tests.
*	rcomb, perm, rperm: test.	Kaz Kylheku	2023-12-27	1	-0/+317
\| \| \| \|	* tests/015/comb.tl: New tests.
*	comb: bug: missing combinations.	Kaz Kylheku	2023-12-26	1	-0/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The comb function is broken; some combinations of items are missing in the output. This is because the iteration reset step in comb_gen_fun_common handles only one column of the state, neglecting to reset the other columns: what is now done by the for (j = i ... loop. I'm changing the representation of the state from a list of lists to a vector of lists. Moreover, it is not reversed. This allows the loop in comb_gen_fun_common to perform random access. * combi.c (k_conses): Return a vector, that is not reversed. (comb_init): New helper function to slightly abstract the use of k_conses. (comb_while_fun): Termination now occurs if the state vector is nil (degenerate case, like k items chosen from n, when k > n), or if the vector has nil in element zero (special flag situation). (comb_gen_fun_common): Rewritten, with correction. The logic is similar. Since we have random access, we don't need the "prev" variable. When we reset a column iterator, we now also populate all the columns to the right of it. For instance, if a given column resets to (a b c), the one to the right must reset to (b c), and so on. In the broken function, this is what was not done, resulting in missing items due to, say, a column resetting to (a b c) but the one next to it remaining at (c). (comb_list_gen_fun): Drop nreverse. (comb_vec_gen_fun, comb_str_gen_fun, comb_hash_gen_fun): Use the same i iterator for the state and the output object, accessing the vector directly. (comb_list, comb_vec, comb_str, comb_hash): Use comb_init. * tests/015/comb.tl: New file.
*	New function: str-esc.	Kaz Kylheku	2023-09-01	1	-0/+39
\| \| \| \| \| \| \| \| \| \|	* lib.[ch] (str_esc): New function. * eval.c (eval_init): str-esc intrinsic registered. * tests/015/esc.tl: New file. * txr.1: Documented.
*	awk: prn returns nil.	Kaz Kylheku	2023-08-26	1	-0/+10
\| \| \| \| \| \| \| \| \|	* stdlib/awk.tl (awk-state prn): Return nil in the no-argument case instead of returning whatever put-string returns. * tests/015/awk-misc.tl: New file. * txr.1: Documented.
*	awk: bug: fix ->> appending redirection operator.	Kaz Kylheku	2023-05-23	1	-0/+42
\| \| \| \| \| \| \|	* stdlib/awk.tl (awk-state ensure-stream): Fix missing handling for the :apf kind symbol used by appending. * tests/015/awk-redir.tl: New file.
*	awk: new feature, res variable.	Kaz Kylheku	2022-12-30	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The res variable captures the specific value of the condition expression, making it available to the action. * autoload.c (awk_set_entries): Intern the res symbol * stdlib/awk.tl (awk): Instead of generating the condition-action into a simple when, we use whenlet to also bind the res variable. * tests/015/awk-res.tl: New file. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
*	cat-str/join/join-with: allow nested sequences	Kaz Kylheku	2022-10-25	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The measure/allocate/catenate functions which underlie the cat-str implementation are streamlined, simplifying the code. At the same time, they handle nested sequences of string/character items. * lib.c (struct cat_str): New member, seen_one. This flips from 0 to 1 after the first item has been seen in the cat_str_measure pass or cat_str_append pass. Each item other than the first is preceded by a separator. (cat_str_measure, cat_str_append): The more_p argument is dropped. We account for the separator with the help of the new seen_one flag, which allows us to easily recurse over items that are sequences. (cat_str_alloc): Reset the seen_one flag in preparation for the cat_str_append pass. (cat_str, vscat, scat2, scat3, join_with): Simplified. * tests/015/split.tl: New tests. * txr.1: Redocumented.
*	New function: str	Kaz Kylheku	2022-06-12	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The str function is like mkstring but allows a fill pattern to be specified. * eval.c (eval_init): str intrinsic registered. * lib.[ch[ (str): New function. * tests/015/str.tl: New file. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
*	New: spln and tokn functions.	Kaz Kylheku	2022-05-30	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of trying to work the new count parameter into the spl and tok functions, it's better to make new ones. * eval.c (eval_init): spln and tokn intrinsics registered. * lib.[ch] (spln, tokn): New functions. * tests/015/split.tl: New test cases. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
*	tok-str: takes count argument.	Kaz Kylheku	2022-05-28	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* eval.c (eval_init): Update registration of tok-str. * lib.c (tok_str): New argument, count_opt. Implemented in the compat 155 case; what the heck. (tok): Pass nil to new parameter of tok_str. * lib.h (tok_str): Declaration updated. * tests/015/split.tl: New tests. * txr.1: Documented.
*	split-str: new count parameter.	Kaz Kylheku	2022-05-17	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* eval.c (eval_init): Fix up registration of split-str to account for new parameter. * lib.c (split_str_keep): Implement new optional count argument. (spl): Pass nil value to split_str_keep for new argument. I'd like this function to benefit from this argument also, but the design isn't settled. (split_str): Pass nil argument to split_str_keep. * lib.h (split_str_keep): Declaration updated. * tests/015/split.tl: New tests. * txr.1: Documented.
*	lazy-str-get-trailing-list: spurious empty string issue.	Kaz Kylheku	2022-01-04	1	-0/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* lib.c (lazy_str_get_trailing_list): Remove the spurious empty string caused by splitting on the terminator. Whenever the materialized prefix is not-empty, and there is a non-empty terminator, the prefix necessarily ends in the termintator. If we split on the terminator, the list of pieces ends in in an empty string, which is undesirable. This has to be subject to compat, unfortunately; it's a very visible behavior that affects the continuation of line-based matching after the @(freeform) directive. * tests/006/freeform-5.txr: With this fix, we no longer have to match the spurious blank line coming from @(freeform). * tests/015/lazy-str.tl: New file. * txr.1: Updated documentation with compat notes. There was some outright incorrect text describing lazy-str-get-trailing-list. Also, the lazy-str-force-upto and lazy-str-force were under-documented. The return value of the former was not completely described: that it returns t in the other case when not returning nil. It wasn't mentioned that the functions observe the limit-count. Moreover, the exact algorithm for forcing is now documented.
*	awk: :fields specifies conversions.	Kaz Kylheku	2021-10-04	1	-1/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* stdlib/awk.tl (sys:awk-compile-time): Slot field-names renamed to field-name-conv. (sys:awk-expander): Parse the new syntax which allows (sym fn) pairs with optional fn, creating a list of normalized items in the field-name-conv slot of the compile-time structure. (sys:awk-symac-let): Adjust the code to the pair representation in field-name-conv. (sys:awk-field-name-code): New function for generating the field conversion code. (awk): Now that we have two optional pieces of code to wrap around p-actions form, we factor that out of the awk-lambda, to a series of conditional assignments. Here we handle the generation of the field conversionns. * conv.tl (sys:conv-expand-sym): New macro, used in sys:awk-field-name-code and sys:conv-let. (sys:conv-let): Simplify with sys:conv-expand-sym. Drop optional argument from i; it connects with no documented feature, and is not usable from fconv. * tests/015/awk-fields.tl: New tests. * txr.1: Updated, including cruft in fconv documentation. Change-Id: Ie42819f58af039fdbcdb1ae365c89dc1add55c93
*	awk: new :fields feature for named fields.	Kaz Kylheku	2021-10-01	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \|	* stdlib/awk.tl (sys:awk-compile-time): New slot, field-names. (sys:awk-expander): Validate and store field-names into compile-time structure. (sys:awk-symac-let): New macro. (awk): Wrap sys:awk-symac-let around code to generate field name macros. * tests/015/awk-fields.tl: New file. * txr.1: Documented.
*	regex-from-trie: correctly handle empty trie.	Kaz Kylheku	2021-06-27	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	* filter.c (regex_from_trie): An empty trie matches nothing, so we must return the t regex syntax (match nothing), not nil (match empty string). A hash-based trie matches nothing if it is empty; but if it has user data, then it matches the empty string. * tests/015/trie.tl: Test cases added.
*	regex-from-trie: bugs processing compressed trie.	Kaz Kylheku	2021-06-27	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	* filter.c (regex_from_trie): If a hash key maps to a string, do not treat that as a trie; it is the value for that node. A value is only a trie if it is a cons or hash. Also, in this case do not make a compound regex. * tests/015/trie.tl: Add duplicate of regex test case using regex from compressed tree.
*	regex-from-trie: bugfix: incomplete regex.	Kaz Kylheku	2021-06-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	* filter.c (regex_from_trie): The code is neglecting to check whether there is a match of the input at the given hash table, which is true if it has user data. In that case, the empty regex must be added as a parallel branch. * tests/015/trie.tl: The first regex test case works now. The second one is incorrect and is replaced.
*	filter: regex-from-trie produces bad or syntax.	Kaz Kylheku	2021-06-27	1	-0/+49
\| \| \| \| \| \| \| \| \| \|	This is not a complete fix yet; the test case still fails. * filter.c (regex_from_trie): The (or ...) operator in the regex language is strictly binary. Do not produce a variable-argument or expression. * tests/015/trie.tl: New file.
*	bug: join-with segfault on character separators.	Kaz Kylheku	2021-05-02	1	-0/+23
\| \| \| \| \| \| \|	* lib.c (join_with): Pass the correct onech array down to cat_str_init, rather than a null pointer. * tests/015/split.tl: New tests covering join and join-with.
*	match-str: tests with negative pos.	Kaz Kylheku	2021-04-28	1	-1/+29
\| \| \| \|	* tests/015/match-str.tl: Tests added.
*	match-str: tests and bugfix.	Kaz Kylheku	2021-04-27	1	-0/+41
\| \| \| \| \| \|	* lib.c (do_match_str): Fix wrong return value calculation in LSTR-LSTR case. * tests/015/match-str.tl: New file.
*	tests: implicitly generate empty .expected files.	Kaz Kylheku	2021-04-12	3	-0/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Makefile (%.expected): New implicit rule. Whenever a test requires a .expected file, if it is missing, we create an empty one. This file will be treated as an intermediate by GNU Make, which means that it will be deleted when make terminates. * tests/012/compile.tl: Some of the .tl files no longer have an .expected file, so we have to test for that in the catenating logic. * tests/008/call-2.expected, * tests/008/no-stdin-hang.expected, * tests/011/macros-3.expected, * tests/011/patmatch.expected, * tests/012/aseq.expected, * tests/012/ashwin.expected, * tests/012/compile.tl, * tests/012/cont.expected, * tests/012/defset.expected, * tests/012/ifa.expected, * tests/012/oop-seq.expected, * tests/012/parse.expected, * tests/012/quasi.expected, * tests/012/quine.expected, * tests/012/seq.expected, * tests/012/struct.expected, * tests/012/stslot.expected, * tests/014/dgram-stream.expected, * tests/014/in6addr-str.expected, * tests/014/inaddr-str.expected, * tests/014/socket-basic.expected, * tests/015/awk-fconv.expected, * tests/015/split.expected, * tests/015/trim.expected, * tests/016/arith.expected, * tests/016/ud-arith.expected, * tests/017/ffi-misc.expected, * tests/018/chmod.expected: Empty file deleted.
*	awk: tests for fconv.	Kaz Kylheku	2020-12-31	2	-0/+21
\| \| \| \| \|	* tests/015/awk-fconv.tl, * tests/015/awk-fconv.expected: New files.
*	New functions trim-left and trim-right.	Kaz Kylheku	2020-10-05	2	-0/+41
\| \| \| \| \| \| \| \| \|	* regex.c (trim_left, trim_right): New static functions. (regex_init): New intrinsics registered. * tests/015/trim.tl, tests/015/trim.expected: New files. * txr.1: Documented.
*	awk: implement ranges right using functions.	Kaz Kylheku	2017-10-29	1	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* share/txr/stdlib/awk.tl (sys:awk%--rng, sys:awk%--rng-, sys:awk%rng+, sys:awk%-rng+, sys:awk%--rng+): New functions. (sys:awk-mac-let): Rewritten range expander. The four basic ranges rng, rng-, -rng and -rng- are handled with in-line expansion, because by doing that we avoid unnecessarily evaluating the from-expression. The remaining cases expand to function calls to the new functions, which receive the flag vector, the index position in that vector and the values of the from and to expressions. The behavior change is that that the -- forms now do the right thing: they hide all leading records that satisfy the from-expression, right to the last record of the range if necessary. * tests/015/awk-rng.expected: Updated. * txr.1: Revise semantic description the -- range types, plus minor fixes.
*	awk: more range test cases.	Kaz Kylheku	2017-10-27	2	-1/+7
\| \| \| \| \| \|	* tests/015/awk-rng.tl: More rows of data. * tests/015/awk-rng.expected: Updated.
*	awk: five new range operators.	Kaz Kylheku	2017-10-25	2	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \|	* share/txr/stdlib/awk.tl (sys;awk-mac-let): Provide the implementation for the local macros --rng, --rng-, rng+, -rng+ and --rng+. * tests/015/awk-rng.tl: New file. * tests/015/awk-rng.expected: New file. * txr.1: Documented.
*	Fix tok-str semantics once again.	Kaz Kylheku	2016-10-26	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem is that when the regular expression is capable of matching empty strings, tok-str will extract an empty token immediately following a non-empty token. For instance (tok-str "a,b" /[^,]/) extracts ("a" "" "b") instead of just ("a" "b"). This is a poor behavior and the way to fix it is to impose a rule that an empty token must not be extracted immediately at the ending position of a previous token. Only a non-empty token can be consecutive to a token. lib.c (tok_str): Rewrite the logic of the loop, using the prev_empty flag to suppress empty tokens which immediately follow non-empty tokens. The addition of 1 to the position when the token is empty to skip a character is done at the bottom of the loop and a new last_end variable keeps track of the end position of the last extracted token for the purposes of extracting the keep-between area if keep_sep is true. The old loop is preserved intact and enabled by compatibility. * tests/015/split.tl: Multiple empty-regex test cases for tok-str updated. * txr.1: Updated tok-str documentation and also added a note between the conditions under which split-str and tok-str, invoked with keep-sep true, produce equivalent output. Added compatibility notes.
*	Tests for tok-str.	Kaz Kylheku	2016-09-17	2	-0/+73
\| \| \| \|	* tests/015/split.tl: New cases added.
*	Adding tests for split-str.	Kaz Kylheku	2016-09-17	1	-0/+123
	* Makefile (TXR_DBG_OPTS): Disable for tst/tests/015. * tests/common.tl (mtest): New macro. * tests/015/split.tl: New file.