txr - TXR: A data munging language.

	Commit message (Collapse)	Author	Age	Files	Lines
*	tests: disable some UTF-8 tests on 16 bit wchar_t.	Kaz Kylheku	2021-04-20	1	-8/+9
\| \| \| \| \|	* tests/012/parse.tl: All the tests in this file blow up on systems that don't have a full-blown character type.
*	compile-file: fix bad diagnostic.	Kaz Kylheku	2021-04-20	1	-1/+1
\| \| \| \| \| \| \|	* share/txr/stdlib/compiler.tl (open-compile-streams): When the output file cannot be opened, the diagnostic message wrongly refers to the input stream object rather than the output file path.
*	configure: remove LIT_ALIGN.	Kaz Kylheku	2021-04-20	2	-40/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	LIT_ALIGN was introduced before there was SIZEOF_WCHAR_T. The latter was introduced on suspicion that they might not be the same. Since LIT_ALIGN is tied to SIZEOF_WCHAR_T again there is no need for it to exist. * configure (lit_align): Variable removed. Documentation of lit-align argument removed. Alignment of wide literals test removed. Not generating LIT_ALIGN in config.h any more. * lib.h (LIT_ALIGN): Occurrences replaced with SIZEOF_WCHAR_T.
*	Revert bogus LIT_ALIGN commit from 2015.	Kaz Kylheku	2021-04-20	3	-56/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 0343c6f32c5bd8335e88595cb9d23506625b2586. I don't see evidence that the claim in the commit is true: that wide literals are not four-byte-aligned on Darwin in spite of sizeof(wchar_t) being 4. Not even with the old clang in my old VM where I first thought I discovered this. * configure: do not set up LIT_ALIGN == 2 for Darwin. * lib.h (litptr): Remove LIT_ALIGN < 4 && SIZEOF_WCHAR_T == 4 case. * HACKING: Undocument bogus claim.
*	bug: broken path handling on LIT_ALIGN == 2.	Kaz Kylheku	2021-04-20	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	On platforms where wchar_t literals have two byte alignment, these misbehave incorrectly, failing to treat / as a path separator. Thus (dir-name "a/b/c") is reported as ".". Lack of test coverage, argh. * stream.c (base_name, dir_name): Do not use wref macro on wli() string literal; the offset is already built-in. * txr.c (sysroot_init): Likewise.
*	lib: missing L prefix in literal.	Kaz Kylheku	2021-04-20	1	-1/+1
\| \| \| \| \| \| \|	* lib.h (wli_noex): The first of three literals being juxtaposed is missing the L prefix, leading to a mixture of wide and regular literals. This is supported by C, but let's avoid it.
*	configure: better way to avoid -no-pie.	Kaz Kylheku	2021-04-20	1	-12/+21
\| \| \| \| \| \| \| \| \| \|	* configure (gcc_version, broken128): Formally declare existing ad hoc variables. (do_nopie): New variable. Compiler version test moved up. We use the gcc version to disable nopie, and check for clang to do the same. Instead of checking for darwin and android to skip the nopie stuff, we check do_nopie.
*	openbsd: fix tests.	Kaz Kylheku	2021-04-20	5	-32/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* tests/014/socket-basic.tl (%iters%): Also reduce to 2000 on OpenBSD, to avoid the default limit on UDP datagram size. * tests/017/glob-carray.tl: Use the BSD-style struct glob-t on OpenBSD also. * tests/017/glob-zarray.tl: Likewise. * tests/018/chmod.tl (os): New global variable. (test-sticky): s-isvtx not allowed for non-root user on OpenBSD, so we falsify this variable. * tests/common.tl (os-symbol): Add OpenBSD case, producing :openbsd keyword symbol. (libc): Let's just use (dlopen nil) for any platform that isn't Cygwin or Cygnal.
*	configure: use $make	Kaz Kylheku	2021-04-20	1	-4/+4
\| \| \| \| \| \|	* configure: in a few tests, we are calling make as "make" rather than via the $make variable. This fails when "make" isn't GNU Make.
*	matcher: first pattern macro, sme.	Kaz Kylheku	2021-04-19	5	-2/+210
\| \| \| \| \| \| \| \| \| \| \| \|	* lisplib.c (match_instantiate): Intern sme symbol. * share/txr/stdlib/doc-syms.tl: Update with sme entry. * share/txr/stdlib/match.tl (sme): New defmatch macro. * tests/011/patmatch.tl: New tests for sme. * txr.1: Documented.
*	doc: reversed maphash parameters.	Kaz Kylheku	2021-04-19	1	-1/+1
\| \| \| \| \|	* txr.1: The function is first, then the hash. Reported by Ray Perry.
*	defmatch: pass form to mac-param-bind.	Kaz Kylheku	2021-04-19	2	-1/+23
\| \| \| \| \| \| \| \|	* share/txr/stdlib/match.tl (defmatch): Pass match-form to mac-param-bind so that the context is available to defmatch macros via the :form parameter. * txr.1: Documented use of :form in defmatch.
*	port: build on OpenBSD	Alexander Shendi	2021-04-18	3	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \|	Tested on OpenBSD amd64. * socket.c: Add <sys/socket.h>. Test for AI_V4MAPPED and AI_ALL being defined. * sysif.c: Add <stdarg.h>. Test for EMULTIHOP, ENODATA, ENOLINK, ENOSR, ENOSTR, EPIPE and ETIME. * termios.c: est for OFILL, VTDLY, VT0 and VT1.
*	compile/eval: print compiler error on stderr.	Kaz Kylheku	2021-04-19	3	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* share/txr/stdlib/error.tl (compile-error): Print the error message on stderr, like we do with warnings. This allows the programming environment to pick up the error message and navigate to that line accordingly. The error message is also output by the unhandled exception logic but with a prefix that prevents parsing by the tooling. To avoid sending double error messages to the interactive user, we only issue the stderr message if load-recursive is true. * tests/common.tl (macro-time-let): New macro. This lets us bind special variables around the macro-expansion of the body, which is useful when expansion-time logic reacts to values of special variables. * tests/012/ifa.tl: Use macro-time-let to suppress stderr around the expansion of the erroneous ifa form. We now needs this because the error situation spits out a message on stderr, in addition to throwing.
*	new: remove superflous prefix from diagnostic.	Kaz Kylheku	2021-04-19	1	-2/+1
\| \| \| \| \|	* share/txr/stdlib/struct.tl (new-expander): Don't format prefix into error message; compile-error does that.
*	matcher: new @(scan) operator.	Kaz Kylheku	2021-04-18	4	-2/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* share/txr/stdlib/match.tl (compile-scan-match): New function. (compile-match): Hook scan operator into compiler. * lisplib.c (match_set_entries): Ensure scan is interned in usr package. * txr.1: Documented. * share/txr/stdlib/doc-syms.tl: Updated with new entry for scan.
*	matcher: allow user-defined patterns via defmatch	Kaz Kylheku	2021-04-17	5	-21/+133
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* lisplib.c (match_set_entries): Register defmatch and match-symbol to autoload match.tl. * share/txr/stdlib/doc-syms.tl: Updated with entries for defmatch and match-macro. * share/txr/stdlib/match.tl (match-macro): New special variable holding hash. (compile-match): Handle macros via match-macro hash. (defmatch): New macro. * txr.1: Documented. * tags.tl: Recognize defmatch forms.
*	streams: revise stream-max-len over strings.	Kaz Kylheku	2021-04-17	2	-30/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The maximum number of characters printed from a string is too small, if it is directly taken from the stream-max-len value. We are going to multiply it by 8, and clamp the minimum characters at 24. * lib.c (max_str_chars): New inline function. (lazy_str_put, out_lazy_str, out_quasi_str): Use inline function to determine maximum number of chracters to print. Also bugfix here: decrement and test max_chr in the loop, not max_len. This bug was copy-pasted across all these functions. (obj_print_impl): Similarly revise the printing of strings. * txr.1: Documentation updated.
*	gc: disable z() macro.	Kaz Kylheku	2021-04-17	1	-0/+5
\| \| \| \| \| \|	* gc.h (z): turn off. This is not achieving its purpose of stopping spurious retention of objects, and adds a fraction of a percent of execution overhead.
*	debugging: disassemble vm code out of debugger.	Kaz Kylheku	2021-04-16	2	-0/+14
\| \| \| \| \| \| \| \|	* lib.c (dis): New function that we can call from gdb to disassemble a VM function, if we know its address. I've done this manually way too many times. * lib.h (dis): Declared.
*	txr: gather: report list of missing required vars.	Kaz Kylheku	2021-04-13	1	-2/+8
\| \| \| \| \|	* match.c (v_gather): Identify all required variables that are missing, and list them all in the diagnostic.
*	doc: implement typesetting of keystrokes.	Kaz Kylheku	2021-04-13	2	-112/+435
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit relies on parallel improvements in man2html, up through commit ac186529b6b5f80906c3215a67c98505db7bb156 "Implement .M2HT request for HTML passthrough." * genman.txr: Add CSS block targetting the kbd element, providing 3D styling for keyboard input. * txr.1: Define two new macros, .key and .keyn. These are defined in three different ways: in man page output, we put square brackets around keystrokes. In typeset groff output, we put a square border around them using a box macro cribbed from groff documentation. In HTML, we use .M2HT to wrap a <kbd> tag around the keystrokes. Documentatio is updated to use these macros for all keystrokes. We no longer separate keystroke sequence elements with commas.
*	tests: use fixed regsub in compile test.	Kaz Kylheku	2021-04-13	1	-1/+1
\| \| \| \|	* tests/012/compile.tl: Simplify code with regsub.
*	regex: regsub wrongly destructive.	Kaz Kylheku	2021-04-13	1	-3/+4
\| \| \| \| \|	* regex.c (regsub): When the regex argument is actually a function, we must copy the string, because replace_str is destructive.
*	vim: remove txr_keyword from tl.vim.	Paul A. Patience	2021-04-13	1	-0/+2
\| \| \| \| \| \|	* genvim.txr: the tl.vim file does not require a highlighting association between txr_keyword and Keyword, since it lacks the txr_keyword match group.
*	repl: fix typo in plain mode banner.	Paul A. Patience	2021-04-13	1	-1/+1
\| \| \| \|	* txr.c (banner): wth -> with.
*	tests: implicitly generate empty .expected files.	Kaz Kylheku	2021-04-12	29	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Makefile (%.expected): New implicit rule. Whenever a test requires a .expected file, if it is missing, we create an empty one. This file will be treated as an intermediate by GNU Make, which means that it will be deleted when make terminates. * tests/012/compile.tl: Some of the .tl files no longer have an .expected file, so we have to test for that in the catenating logic. * tests/008/call-2.expected, * tests/008/no-stdin-hang.expected, * tests/011/macros-3.expected, * tests/011/patmatch.expected, * tests/012/aseq.expected, * tests/012/ashwin.expected, * tests/012/compile.tl, * tests/012/cont.expected, * tests/012/defset.expected, * tests/012/ifa.expected, * tests/012/oop-seq.expected, * tests/012/parse.expected, * tests/012/quasi.expected, * tests/012/quine.expected, * tests/012/seq.expected, * tests/012/struct.expected, * tests/012/stslot.expected, * tests/014/dgram-stream.expected, * tests/014/in6addr-str.expected, * tests/014/inaddr-str.expected, * tests/014/socket-basic.expected, * tests/015/awk-fconv.expected, * tests/015/split.expected, * tests/015/trim.expected, * tests/016/arith.expected, * tests/016/ud-arith.expected, * tests/017/ffi-misc.expected, * tests/018/chmod.expected: Empty file deleted.
*	compiler: new test case.	Kaz Kylheku	2021-04-11	1	-0/+12
\| \| \| \| \| \| \| \|	* tests/012/compile.tl (new-file): Compiles a select set of .tl files in the same directory. The compile.expected file is dynamically created from catenating the .expected files corresponding to those .tl files; the output is expected to be the same from compiling those files as from interpreting them.
*	compiler: bugfix: rest parameter in inline lambda	Kaz Kylheku	2021-04-11	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \|	* share/txr/stdlib/compiler.tl (lambda-apply-transform): Do not take all of the fixed arguments and rest expression to be the trailing list. Rather, skip as many elements from these as the function has fixed parameters. E.g. if there are two fixed parameters as in (lambda (a b . c)) and the call specifies four fixed parameters and a trailing x (1 2 3 4 . x) then the rest argument c must be (list* 3 4 . x) and not (list* 1 2 3 4 . x).
*	gc: code improvment in finalizer calling.	Kaz Kylheku	2021-04-11	1	-1/+1
\| \| \| \| \| \| \| \|	* gc.c (call_finalizers_impl): We don't have to null out the next pointer of the finalization entry in the loop and note that we are not doing this for the nodes that are going back into final_list. Rather, we null-terminate the found list at the end of the loop, just like we do with the final list.
*	gc: sys:gc function must not reset full_gc flag.	Kaz Kylheku	2021-04-11	1	-1/+2
\| \| \| \| \| \| \| \|	* gc.c (gc_wrap): We must not set full_gc according to the argument, but only set it to 1 if the argument requests full GC. full_gc is set to 1 for some reason having to do with correctness; only the garbage collector can reset full_gc back to 0, otherwise incorrect behavior will ensue.
*	compiler: bug: symbol not in ffuns in call forms.	Kaz Kylheku	2021-04-10	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This bug causes forms like (call (fun 'foo) ...) not to register foo as a free reference in the function space, leading to inappropriate lambda lifting optimizations. The compiler thinks that a lambda is safe to move because that lambda doesn't reference any surrounding lexical functions, which is incorrect. A failing test case for this is (compile-file "tests/012/man-or-boy.tl") at opt-level 3 or higher. A bogus error occurs similar to "function #:g0144 is not defined", due to that function being referenced from a lifted lambda, and not being in its scope. * share/txr/stdlib/compiler.tl (compiler (comp-fun-form, comp-apply-call)): Pass the function symbol as an extra argument to comp-fun-form so that it's added to ffuns. (compiler comp-call-impl): Take new optional argument: a symbol to be added to the ffuns slot of the returned fragment, indicating that a function symbol is referenced.
*	doc: redocument UTF-8 in source and literals.	Kaz Kylheku	2021-04-10	1	-32/+48
\| \| \| \| \| \| \| \| \| \| \|	* txr.1: Because invalid UTF-8 bytes are allowed in string literals, that documentation needs to be updated. I'm rewriting it substantially to clarify the difference between text streams and parsing source. In the discussion of escape sequences in string literals, the wording is improved. Because the source code is UTF-8, we could plausibly support escapes which specify bytes (that are then decoded), so that's not the correct rationale for not supporting it.
*	doc: remove some hyphenation.	Kaz Kylheku	2021-04-09	1	-10/+10
\| \| \| \| \|	* txr.1: Do not hyphenate two's complement and C language, except in phrases like "C-language-style whatever".
*	doc: lambda: add pointers to alternative notations.	Kaz Kylheku	2021-04-09	1	-0/+75
\| \| \| \| \| \|	* txr.1: Under the lambda operator, point to the op notational family, functional combinators and the OOP-related method and slot indirection.
*	doc: more details in string literals section.	Kaz Kylheku	2021-04-09	1	-0/+14
\| \| \| \| \|	* txr.1: advise user that numeric escapes in string literals are not byte-wise, but specify code points.
*	doc: big patch: hyphenation, wording, formatting.	Paul A. Patience	2021-04-09	1	-473/+573
\| \| \| \| \|	* txr.1: Numerous issues of hyphenation, formatting, and errors in typography and formatting are addressed.
*	parser: allow non-UTF-8 bytes in literals and regexes.	Kaz Kylheku	2021-04-08	3	-2992/+2945
\| \| \| \| \| \| \| \| \| \|	* parser.l (grammar): Just like we do in SREGEX, allow an arbitrary byte in REGEX, mapping it to the DCxx range. Do the same inside string literals of all types. * lex.yy.c.shipped: Updated. * tests/012/parse.tl: New tests.
*	parser: check in .shipped materials.	Kaz Kylheku	2021-04-08	2	-1768/+1835
\| \| \| \| \| \| \| \| \| \| \|	This picks up the changes introduced by the previous three commits. * lex.yy.c.shipped: Updated. * y.tab.c.shipped: Likewise. * y.tab.h.shipped: Likewise.
*	parser: allow funny UTF-8 in regexes and literals.	Kaz Kylheku	2021-04-08	4	-7/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The main idea in this commit is to change a behavior of the lexer, and take advantage of it in the parser. Currently, the lexer recognizes a {UANYN} pattern in two places. That pattern matches a UTF-8 character. The lexeme is passed to the decoder, which is expected to produce exactly one wide character. If the UTF-8 is bad (for instance, a code in the surrogate pair range U+DCxx) then the decoder will produce multiple characters. In that case, these rules return ERRTOK instead of a LITCHAR or REGCHAR. The idea is: why don't we just return those characters as a TEXT token? Then we can just incorporate that into the literal or regex. * parser.l (grammar): If a UANYN lexeme decodes to multiple characters instead of the expected one, then produce a TEXT token instead of complaining about invalid UTF-8 bytes. * parser.y (regterm): Recognize a TEXT item as a regterm, converting its string value to a compound node in the regex AST, so it will be correctly treated as a fixed pattern. (chrlit): If a hash-backslash is followed by a TEXT token, which can happen now, that is invalid; we diagnose that as invalid UTF-8. (quasi_item): Remove TEXT rule, because the litchars constituent not generates TEXT. (litchars, restlistchar): Recognize TEXT item, similarly to regterm. * tests/012/parse.tl: New file. * tests/012/parse.expected: Likewise.
*	parser: fix few memory leaks in error recovery.	Kaz Kylheku	2021-04-08	1	-0/+4
\| \| \| \| \| \| \| \| \|	* parser.y (var, o_var): In a few error productions in which we have a SYMTOK item, we should free the lexeme. This doesn't solve all leaks: any time we have a parser stack containing SYMTOK or TEXT items that belong to rules that have not yet been reduced, and the parse job is aborted due to errors, we leak those.
*	parser: fix poor diagnosis of \x invalid escape.	Kaz Kylheku	2021-04-08	1	-1/+12
\| \| \| \| \| \| \|	* parser.l (grammar): Because the \x pattern requires one or more digits after it, if they are not present, we simply report \x as an an unrecognized escape. It's better if we diagnose it properly as a \x that is not followed by digits.
*	build: calm restless yacc.	Kaz Kylheku	2021-04-08	1	-6/+1
\| \| \| \| \| \| \| \| \| \|	* Makefile (%.tab.c %.tab.h): Remove the trick of keeping the old y.tab.h file if it has not changed. This was once a good idea, but now that we have a proper grouped targets pattern rule which knows that y.tab.h depends on and is produced from parser.y, the trick causes y.tab.h to be perpetually out of date due to its old time stamp, and so yacc is run on every build.
*	doc: bad syntax under doc function.	Kaz Kylheku	2021-04-08	1	-1/+1
\| \| \| \|	* txr.1: Fix formatting.
*	Version 256txr-256	Kaz Kylheku	2021-04-07	6	-965/+1023
\| \| \| \| \| \| \| \| \| \|	* RELNOTES: Updated. * configure, txr.1: Bumped version and date. * share/txr/stdlib/ver.tl: Bumped. * txr.vim, tl.vim: Regenerated.
*	doc: support doc function on android.	Kaz Kylheku	2021-04-07	1	-2/+2
\| \| \| \| \|	* share/txr/stdlib/doc-lookup.tl (open-url): Define for android, which has xdg-open in the termux environment.
*	utf8: fix backtracking bugs in buffer decoder.	Kaz Kylheku	2021-04-07	2	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* utf8.c (utf8_from_buffer): Fix incorrect backtracking logic for handling bad UTF-8 bytes. Firstly, we are not backtracking to the correct byte. Because src is incremented at the top of the loop, the backtrack pointer must be set to src - 1 to point to the possibly bad byte. Secondly, when we backtrack, we are neglecting to rewinding nbytes! Thus after backtracking, we will not scan the entire input. Let's avoid using nbytes, and guard the loop based on whether we hit the end of the buffer; then we don't have any nbytes state to backtrack. * tests/017/ffi-misc.tl: New test case converting a three-byte UTF-8 encoding of U+DC01: an invalid character in the surrogate range. We test that the buffer decoder turns this into three characters, exactly like the stream decoder. Another test case for invalid bytes following a valid sequence start.
*	awk: bugfix: string rs must not compile as regex.	Kaz Kylheku	2021-04-07	1	-5/+5
\| \| \| \| \| \| \| \|	* share/txr/stdlib/awk.tl (awk-state loop): When rs contains a string, do not pass it directly to regex-compile, because that function calls regex-parse when the argument is a string. Wrap it it a (compound ...) tree node to get it to be treated as sequence of characters to match.
*	gc: fix astonishing bug in weak hash processing.	Kaz Kylheku	2021-04-06	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a flaw that has been in the code since the initial implementation in 2009. Weak hash tables are only partially marked during the initial garbage collection marking phase. They are put into a global list, which is then walked again to do the weak processing: to expire items which are not reachable, and then finish walking the table objects. Problem is, the code assumes that this late processing will not discover more hash tables and put them into that global list. This creates a problem when weak hash table contain weak hash tables, such as in the important and very common case when a global variable (binding stored in a weak hash table) contains a weak hash table! These hash tables discovered during weak hash table processing are partially marked, and left that way. The result is that their table vectors get prematurely scavenged by the garbage collector, and then fall victim to use-after-free crashing. Note: do_iters doesn't have this bug. Though the reachable_iters list resembles reachable_weak_hashes, the key difference is that do_iters does not do any marking, and so will not discover any more reachable objects. All it does is update some counts in the hashes to which the still-reachable iterators point. * hash.c (do_weak_tables): Clear the reachable_weak_hashes list on entry into the function, taking a local copy of its head. After walking the list, check the global variable again; it if has become non-null, it means more weak tables were discovered and added to the list. In that case, make a recursive call (susceptible to tail call treatment) to process the list again.
*	qref: bugfix: handle a.(b).?c correctly.	Kaz Kylheku	2021-04-05	1	-1/+1
\| \| \| \| \|	* share/txr/stdlib/struct.tl (qref): Do not assume that (b) is the name of a slot to be looked up. Use qref to handle it.