txr - TXR: A data munging language.

	Commit message (Collapse)	Author	Age	Files	Lines
*	find-if: optimized rewrite and hash support.	Kaz Kylheku	2017-11-15	2	-10/+63
\| \| \| \| \| \| \| \| \| \|	* lib.c (find_if): Function rewritten to use the seq_info sequence classification mechanism, for much better performance on vector-like objects. Also, supports hash tables just like find_max. * txr.1: Documentation updated regarding hash support of find-if.
*	find-max: tiny optimization for vectors.	Kaz Kylheku	2017-11-15	1	-1/+1
\| \| \| \| \| \|	* lib.c (find_max): The vector case must loop from index one, not zero, so as not to wastefully compare the initial max element to itself.
*	doc: subtypep unspecified behavior	Kaz Kylheku	2017-11-14	1	-0/+3
\| \| \| \| \|	* txr.1: Behavior of subtypep is not specified if either argument isn't a type.
*	awk: replace set-diff uses with diff.	Kaz Kylheku	2017-11-01	1	-4/+4
\| \| \| \| \|	* share/txr/stdlib/awk.tl (sys:awk-mac-let): A few occurrences of the deprecated set-diff function are replaced with diff.
*	streams: allow "b" flag on open-command.	Kaz Kylheku	2017-10-30	3	-3/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, using "rb" in open-command reports an error on GNU/Linux, due to popen not liking the "b" mode. On Cygwin, the "b" flag is useful with popen. * stream.c (normalize_mode_no_bin): New function. (open_command): Use normalize_mode_no_bin instead of normalize_mode to strip out the binary flag. This doesn't happen on Cygwin, though. * stream.h (normalize_mode_no_bin): Declared. * share/txr/stdlib/getput.tl (command-get-buf): Since we are getting binary data, pass the "rb" mode to open-command, now that it works. (command-put-buf): Add "b" flag to mode passed to open-command.
*	doc: wording under eq.	Kaz Kylheku	2017-10-30	1	-1/+1
\| \| \| \| \|	* txr.1: fix awkward wording which applies the definite article "the" to a Lisp expression.
*	doc: wrong wording under put-buf.	Kaz Kylheku	2017-10-30	1	-1/+1
\| \| \| \|	* txr.1: Streams support put-byte, not buffers.
*	genvim: % is constituent of identifiers.	Kaz Kylheku	2017-10-30	1	-1/+1
\| \| \| \|	* genvim.txr (iskeyword): add % character.
*	awk: implement ranges right using functions.	Kaz Kylheku	2017-10-29	3	-74/+133
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* share/txr/stdlib/awk.tl (sys:awk%--rng, sys:awk%--rng-, sys:awk%rng+, sys:awk%-rng+, sys:awk%--rng+): New functions. (sys:awk-mac-let): Rewritten range expander. The four basic ranges rng, rng-, -rng and -rng- are handled with in-line expansion, because by doing that we avoid unnecessarily evaluating the from-expression. The remaining cases expand to function calls to the new functions, which receive the flag vector, the index position in that vector and the values of the from and to expressions. The behavior change is that that the -- forms now do the right thing: they hide all leading records that satisfy the from-expression, right to the last record of the range if necessary. * tests/015/awk-rng.expected: Updated. * txr.1: Revise semantic description the -- range types, plus minor fixes.
*	New convenience I/O functions for buffers.	Kaz Kylheku	2017-10-27	3	-0/+101
\| \| \| \| \| \| \| \| \| \| \| \| \|	* lisplib.c (getput_set_entries): New autoload entries for file-get-buf, file-put-buf, file-append-buf, command-get-buf and command-put-buf. * share/txr/stdlib/getput.tl (sys:get-buf-common): New function. (file-get-buf, file-put-buf, file-append-buf, command-get-buf, command-put-buf): New functions. * txr.1: Documented.
*	awk: more range test cases.	Kaz Kylheku	2017-10-27	2	-1/+7
\| \| \| \| \| \|	* tests/015/awk-rng.tl: More rows of data. * tests/015/awk-rng.expected: Updated.
*	awk: fix buggy handling of new -- ranges.	Kaz Kylheku	2017-10-27	1	-21/+17
\| \| \| \| \| \| \| \| \| \| \| \|	The problem is that when records appear in the middle of the range which again match from-expr, they get suppressed. * share/txr/stdlib/awk.tl (sys:awk-mac-let): Get rid of the flag-mid variable. It cannot work because middle is a state in its own right that cannot be inferred from the existing states (nil, t, :end) and the value of from-expr. We get rid of the flag and introduce a :mid state value.
*	carray: check type object in several API functions.	Kaz Kylheku	2017-10-26	1	-4/+4
\| \| \| \| \| \| \|	* ffi.c (carray_blank, carray_buf, carray_cptr, carray_pun): these functions should be using ffi_type_struct_checked, since they are public interfaces to which anything can be passed. Otherwise TXR can easily be crashed by misusing them.
*	carray: bugfix: allow negative indexing in ref operation.	Kaz Kylheku	2017-10-26	1	-0/+3
\| \| \| \| \| \|	* ffi.c (carray_ref): If the index is negative, displace it by the length of the array. (Then if it is still negative, the function will throw.)
*	doc: grammar in description of rr.	Kaz Kylheku	2017-10-26	1	-1/+1
\| \| \| \|	* txr.1: rr searches for "a match" not "a matches".
*	doc: partition function: syntax formatting	Kaz Kylheku	2017-10-26	1	-1/+1
\| \| \| \| \| \|	* txr.1: Fix bungled formatting of third argument alternatives in the syntax synopsis of the partition function.
*	op/do: nice error if arguments are not provided.	Kaz Kylheku	2017-10-26	1	-0/+2
\| \| \| \| \| \| \| \|	* share/txr/stdlib/op.tl (sys:op-expand): Throw error if argument list is empty. We refer to the compile-error function by quote to avoid triggering the auto-load of the module which defines it, due to the circular dependency on op.
*	awk: bugfix: lack of hygiene in range implementation.	Kaz Kylheku	2017-10-26	1	-9/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	The code is using a non-hygienic variable called flag as a placelet alias. This binding is visible to range expressions. For instance (rng #/x/ flag) actually references the range expression's internal flag, rather than producing a warning about an unbound variable. * share/txr/stdlib/awk.tl (sys:awk-mac-let): Allocate a gensym for the flag. Then use ,flag throughout the code templates rather than flag to insert the gensym wherever the symbol flag previously appeared.
*	awk: retrieve range flag vector once per iteration.	Kaz Kylheku	2017-10-25	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an improvement in the code generation related to awk range expressions. Previously, on each iteration, for each range expression, the awk state structure is accessed to retrieve the flag vector, which is then kept in a lexical variable. With this change, the retrieval is done once for all the range expressions, which share the same variable to access it. * share/txr/stdlib/awk.tl (sys:awk-compile-time): New slot, rng-vec-temp. (sys:awk-mac-let): Alias the flag variable to a simplified vecref expression which accesses the vector assumed to have been retrieved and bound to the variable named by the rng-vec-temp gensym. (awk): Add one more variable binding into the scope of the ranges: the binding of the variable named by the rng-vec-temp gensym, to an expression which retrieves the rng-vec from the Awk run-time state structure.
*	awk: five new range operators.	Kaz Kylheku	2017-10-25	4	-59/+301
\| \| \| \| \| \| \| \| \| \| \| \|	* share/txr/stdlib/awk.tl (sys;awk-mac-let): Provide the implementation for the local macros --rng, --rng-, rng+, -rng+ and --rng+. * tests/015/awk-rng.tl: New file. * tests/015/awk-rng.expected: New file. * txr.1: Documented.
*	caseq, caseql, casequal: improvement in expansion.	Kaz Kylheku	2017-10-25	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	* eval.c (me_case): When a list of case keys is one element long, reduce it to an atom. Then a simple equality is applied whether the item is equal to the key, rather than whether it is a member of a list containing that one key. This helps with the (t) case which is mandatory, since t is ruled out as a key.
*	Makefile: further improvement of tests.	Kaz Kylheku	2017-10-25	1	-32/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem is that if a test is interrupted, it will not be re-run because the .ok stamp file depends only on an .out file, and that has been successfully created. We completely remove .out files from the rule tree. Quite simply, the output of a test is the .ok stamp. If that is out of date or doesn't exist, the test is run. Generation of the .out is just a side effect. * Makefile (TESTS_OK): Calculate this variable directly from the wildcard over .txr and .tl files..directly rather than from TESTS_OUT. (TESTS_OUT): Variable removed. (TXR_OPTS, TXR_ARGS): The target-specific assignments of these variables for specific tests is now done against .ok stamp file targets rather than .out targets. (TST_EXPECTED, TST_OUT): New helper variables for condensing repeated instances of some syntax. (tst/%.out): Both of these rules are turned into rules which target tst/%.ok. The .out files are just a side effect; the goal is to update the stamp. If an .out file is removed, the test won't be re-run; only if an .ok file is removed, or any of the real prerequisites change. (%.ok): This rule disappears, and its body containing the conditional stamp file touch is moved into both tst/%.ok rules.
*	Makefile: fix silliness in "tests" target.	Kaz Kylheku	2017-10-25	1	-10/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Last I addressed this, I didn't get it quite right. The problem is that the .out files are being removed when a test fails, which is annoying. The whole redirection of test results to a temp file which is then renamed is silly. Now, the .out files are preserved. Whether or not a test passed depends on whether or not the .ok stamp file is created or updated, so it doesn't matter whether an .out file exists or not. Also, tests are made dependent on the executable, and on the .expected files. If the executable is newer than the test outputs, all the tests will re-run. Also, if any .expected file is touched, the corresponding test will be re-run. * Makefile (clean): Do not remove $(TESTS_TMP). (TESTS_TMP): Variable removed. (tst/%.out): In both rules that run the test and make .out files simply redirect the output directly to the .out file represented as the $@ target. This is how it was before, once upon a time. (%.ok): Do not remove the .out file represdented by $< if the test fails; it is sufficient not to create/touch the .ok stamp file.
*	Makefile: improve command abbreviation.	Kaz Kylheku	2017-10-25	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \|	For all build steps other than linking, print only the leftmost prerequisite of the target. * Makefile (ABBREV): The macro references $< rather than $^, and hence longer needs the $(DEP_$@) filtering. (ABBREVN): New macro, identical to previous ABBREV, modulo a whitespace fix: removal of a stray tab character. (LINK_PROG): For linking, use ABBREVN so that all the object files are shown.
*	hash: optimization in remhash.	Kaz Kylheku	2017-10-23	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \|	* hash.c (remhash): Walk chain to splice out to-be-removed entry using an approach similar to what is done in do_weak_tables to splice out lapsed weak entries. This eliminates one extra traversal of the chain as well as consing due to the ldiff call. We use raw pointers obtained using valptr, and direct assignment through pchain because later cells in a chain are strictly older objects than earlier cells and so so the pchain = cdr(*pchain) assignment cannot make a generation 1 object point to a generation 0 object.
*	hash: fix broken copy_hash.	Kaz Kylheku	2017-10-23	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Impact assessment: this bug affects the correctness of all programs which rely on copying hash tables. Direct reliance means the use of copy-hash, or using the generic copy function on hash objects. Indirect reliance occurs through hash-diff which uses copy-hash. Nothing in TXR itself calls hash-diff. The the listener's Tab completion relies on copy-hash for package-sensitive symbol visibility calculation. Since that is an interactive feature, the impact is low. * hash.c (copy_hash_chain): New static function. (copy_hash): Use copy_hash_chain instead of copy_alist, since the pairs are hash conses and not regular conses: they have a hash value field that must be copied.
*	hash: remove pointless nullify ops.	Kaz Kylheku	2017-10-23	1	-4/+0
\| \| \| \| \| \| \|	* hash.c (hash_assoc, hash_assql): Remove useless nullify calls. These are copy and paste leftovers, since these functions were based on assoc and assql, which handle sequences other than lists.
*	New variant of op: lop.	Kaz Kylheku	2017-10-19	3	-6/+107
\| \| \| \| \| \| \| \| \| \|	* lisplib.c (op_set_entries): Add lop to auto-load list. * share/txr/stdlib/op.tl (sys:op-expand): Recognize lop and implement its transformation. (lop) New macro. * txr.1: Documented.
*	find_max: convert to use seq_info.	Kaz Kylheku	2017-10-13	1	-20/+17
\| \| \| \| \| \| \|	* lib.c (find_max): Sequence classification rewritten to use seq_info. The cases are almost the same, but refer to si.obj rather than seq. Some care is taken in the list case to not hold a reference to the list head.
*	rfind: rewrite to be like find.	Kaz Kylheku	2017-10-13	1	-11/+48
\| \| \| \| \| \|	* lib.c (rfind): Instead of treating the sequence as a list, classify with seq_info just like find. Basically the whole function is replaced with an altered copy of find.
*	find: convert to seq_info classification.	Kaz Kylheku	2017-10-13	1	-44/+36
\| \| \| \| \| \| \|	* lib.c (find): Convert switch statement to use the seq_info function to classify the sequence. For SEQ_VECLIKE, we still check whether the original object is a literal or regular string to treat it specially.
*	tprint and -t option: handle infinite list.	Kaz Kylheku	2017-10-12	2	-14/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test case: txr -t '(gun "foo")' must run in constant memory. * eval.c (tprint): Rewritten to iterate over lists using open loop rather than mapdo. Classification of the sequence is done using the new seq_info, as must be for all new sequence functions. * txr.c (txr_main): Implementation of -t, -p and -P captures the result of the expression in a variable whose value is zapped when it is passed to the function. A gc_hint is added so that this isn't optimized away. Thus, this code won't hold on to the original pointer to a lazy, infinite list.
*	Fixes in partition, partition, split and split.	Kaz Kylheku	2017-09-29	2	-116/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bunch of issues here: broken pre-171 compatibility, non-termination on lazy infinite lists of indices, doc issues. * lib.c (partition_func, split_func, split_star_func): Do the check for negative index values here, with the compat handling for 170 or older. (partition_split_common): Remove code that tries to adjust negative indices, and delete zeros or indices that are still negative after adjustment. The code consumes the entire list of prefixes, so chokes on lazy lists. Also in the compat case, there is complete breakage: the loop doesn't execute, and so out is just nil, and it is taken as the index list. (partition_star_func): Similar change like in partition_func. (partition_star): Similarly to partition_split_common, take out the bogus loop. Also take out loop that tries to remove leading negatives: we cannot do that because we haven't normalized them. * txr.1: Revised doc. Condensed by describing index-list argument in detail under partition. For the other functions, we refer to that one. Conditions for safely handling infinite list of indices spelled out.
*	Makefile: clean temporary file used in testing.	Kaz Kylheku	2017-09-28	1	-0/+1
\| \| \| \|	* Makefile (clean): Remove $(TESTS_TMP) if it exists.
*	Makefile: print failing command in condensed mode.	Kaz Kylheku	2017-09-28	1	-63/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When make output is condensed, showing a summary of each build step in the style "CC txr.c -> opt/txr.o", as is the case by default, the failing build command is now shown. Previously, a failed build had to be re-invoked with make VERBOSE=y to show the failing command. * Makefile (SH): New macro. (COMPILE_C, COMPILE_C_WITH_DEPS, LINK_PROG, WINDRES, INSTALL): These macros now invoke commands via SH rather than directly. (lex.yy.c, y.tab.h, y.tab.c, install-tests, %): Recipes for these targets use SH macro for executing shell commands rather than specifying them directly. (tst/%.out, %.ok, %.expected): These test-related pattern rules also use SH.
*	cleanup: remove unnecessary header includes.	Kaz Kylheku	2017-09-19	4	-7/+0
\| \| \| \| \| \| \| \| \| \|	* eval.c: doesn't need rand.h. * filter.c: doesn't need gc.h. * parser.l: doesn't need eval.h. * parser.y: doesn't need utf8.h, stream.h, args.h or cadr.h.
*	Version 186.txr-186	Kaz Kylheku	2017-09-16	6	-472/+522
\| \| \| \| \| \| \| \| \| \|	* RELNOTES: Updated. * configure, txr.1: Bumped version and date. * share/txr/stdlib/ver.tl: Likewise. * txr.vim, tl.vim: Regenerated.
*	places: use Lisp-1 macroexpansion where needed.	Kaz Kylheku	2017-09-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A test case for this very subtle bug is this: (let ((v (list 1 2 3))) (symacrolet ((x v)) (flet ((x () 42)) (set [x 0] 0)))) Because x is being evaluated in the DWIM brackets which flatten the two namespaces into one, it must be treated as a reference to the flet, and so [x 0] denotes the function call. The assignment is erroneous. The incorrect behavior being fixed is that the places code macro-expands x in the Lisp-2 style under which the symacrolet is not shadowed by the flet. The substitution of v takes place, and the assignment assigns to [v 0]. * share/txr/stdlib/place.tl (sys:l1-setq, sys:l1-val): Use macroexpand-lisp1 rather than macroexpand.
*	doc: issues in qquote example.	Kaz Kylheku	2017-09-14	1	-2/+2
\| \| \| \| \|	* txr.1: fix flaw in comment next to ^(qquote (unquote ,x)). Clarify accompanying text.
*	doc: improve example under regsub	Kaz Kylheku	2017-09-14	1	-1/+1
\| \| \| \|	* txr.1: instead of (op r^ ...) we can use (fr^ ...).
*	doc: grammar under make-zstruct.	Kaz Kylheku	2017-09-14	1	-1/+1
\| \| \| \|	* txr.1: singularize inappropriate plural.
*	doc: move away from "text processing".	Kaz Kylheku	2017-09-14	1	-7/+6
\| \| \| \| \| \|	* txr.1: Change title and heading to just "programming language". Opening paragraph explains TXR as being a programming language supporting multiple paradigms.
*	doc: mention FFI early.	Kaz Kylheku	2017-09-14	1	-0/+3
\| \| \| \|	* txr.1: Introductory paragraphs mention FFI.
*	bugfix: fixnum crackdown.	Kaz Kylheku	2017-09-13	5	-26/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The purpose of this commit is to address certain situations in which code is wrongly relying on a cnum value being in the fixnum range (NUM_MIN to NUM_MAX), so that num_fast can safely be used on it. One wrong pattern is that c_num is applied to some Lisp value, and that value (or one derived from it arithmetically) is then passed to num_fast. The problem is that c_num succeeds on integers outside of the fixnum range. Some bignum values convert to a cnum successfully. Thus either num has to be used instead of num_fast, or else the original c_num attempt must be replaced with something that will fail if the original value isn't a fixnum. (In the latter case, any arithmetic on the fixnum cannot produce value outside of that range). * buf.c (buf_put_bytes): The size argument here is not guaranteed to be in fixnum range: use num. * combi.c (perm_init_common): Throw if the sequence length isn't a fixnum. Thus the num_fast in perm_while_fun is correct, since the ci value is bounded by k, which is bounded by n. * hash.c (hash_grow): Remove dubious assertion which aborts the run-time if the hash table doubling overflows. Simply don't allow the modulus to grow beyond NUM_MAX. If doubling it makes it larger than NUM_MAX, then just don't grow the table. We need the modulus to be in fixnum range, so that uses of num_fast on the modulus value elsewhere are correct. (group_by, group_reduce): Use c_fixnum rather than c_num to extract a value that is later assumed to be a fixnum. * lib.c (c_fixnum): New function. (nreverse, reverse, remove_if, less, window_map_list, sort_vec, unique): Use c_fixnum rather than c_num to extract a value that is later assumed to be a fixnum. (string_extend): Use c_fixnum rather than c_num to extract a value that is later assumed to be a fixnum. Cap the string allocation size to fixnum range rather than INT_PTR_MAX. (cmp_str): The wcscmp function could return values outside of the fixnum range, so we must use num, not num_fast. * lib.h (c_fixnum): Declared.
*	regex: bugfix: squash duplicates in move set.	Kaz Kylheku	2017-09-13	1	-2/+1
\| \| \| \| \| \| \| \|	* regex.c (nfa_move_closure): The move set calculation is wrongly assuming that all of the states are new and not testing their visited color. This could result in the same state being added twice. Though harmless, it wastefully inflates the set size.
*	regex: factor out repeated visit-coloring pattern.	Kaz Kylheku	2017-09-13	1	-13/+15
\| \| \| \| \| \| \|	* regex.c (nfa_test_set_visited): New inline function. (nfa_map_states, nfa_thread_epsilons, nfa_closure, nfa_move_closure): Use function instead of coding pattern which tests the state and sets the visited member.
*	regex: re-introduce nfa_accept states.	Kaz Kylheku	2017-09-13	1	-13/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The nfa_accept state label is re-introduced. This state type has the same representation as nfa_empty; essentially, this replaces the flag. This makes the state type smaller, and we don't have to access the flag to tell if a state is an acceptance state. * regex.c (nfa_kind_t): New enum label, nfa_accept. (struct nfa_state_empty): Member accept removed. (nfa_accept_state_p): Macro tests only for nfa_accept type. (nfa_empty_state_p): New macro. (nfa_state_accept): Set type of new state to nfa_accept; do not set accept flag. (nfa_state_empty): Do not set accept flag. (nfa_state_empty_convert): Do not clear accept flag. (nfa_map_states): Handle nfa_accept in switch, in the same case as nfa_empty. (nfa_thread_epsilons): Don't test for accept state in nfa_empty case; it would be always false now. Add nfa_accept case to switch which only arranges for a traversal of the two transitions. (Though these are expected to be null at the stage of the graph when this function is applied). (nfa_fold_accept): Switch type to nfa_accept rather than setting accept flag. (nfa_closure, nfa_move_closure): Use new macro for testing whether a state is empty.
*	doc: more notes on regex % operator syntax.	Kaz Kylheku	2017-09-12	1	-0/+34
\| \| \| \| \| \|	* txr.1: The dual precedence of % leads to surprises; when parentheses are used around % expressions, they don't behave symmetrically on both sides.
*	regex: retain unoptimized form for printing.	Kaz Kylheku	2017-09-12	1	-5/+1
\| \| \| \| \| \| \|	regex.c (regex_compile): Take the source code to be the original code, rather than the version with AST-level optimizations and expansions related to the nongreedy operator.
*	regex: bug printing #/abc(def\|ghi)/	Kaz Kylheku	2017-09-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This was broken by the July 16 commit "regex: don't print superfluous parens around classes", 2411f779f47c441659720ad0ddcabf91df1d2529. * regex.c (print_rec): If an (or ...) appears as a compound element, it must be rendered in parentheses; or_s must be handled here just like and_s. Prior to the faulty commit, this was implicitly true because the logic was inverted and wasn't ruling out or_s.