txr - TXR: A data munging language.

	Commit message (Collapse)	Author	Age	Files	Lines
*	gc: code improvment in finalizer calling.	Kaz Kylheku	2021-04-11	1	-1/+1
\| \| \| \| \| \| \| \|	* gc.c (call_finalizers_impl): We don't have to null out the next pointer of the finalization entry in the loop and note that we are not doing this for the nodes that are going back into final_list. Rather, we null-terminate the found list at the end of the loop, just like we do with the final list.
*	gc: sys:gc function must not reset full_gc flag.	Kaz Kylheku	2021-04-11	1	-1/+2
\| \| \| \| \| \| \| \|	* gc.c (gc_wrap): We must not set full_gc according to the argument, but only set it to 1 if the argument requests full GC. full_gc is set to 1 for some reason having to do with correctness; only the garbage collector can reset full_gc back to 0, otherwise incorrect behavior will ensue.
*	compiler: bug: symbol not in ffuns in call forms.	Kaz Kylheku	2021-04-10	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This bug causes forms like (call (fun 'foo) ...) not to register foo as a free reference in the function space, leading to inappropriate lambda lifting optimizations. The compiler thinks that a lambda is safe to move because that lambda doesn't reference any surrounding lexical functions, which is incorrect. A failing test case for this is (compile-file "tests/012/man-or-boy.tl") at opt-level 3 or higher. A bogus error occurs similar to "function #:g0144 is not defined", due to that function being referenced from a lifted lambda, and not being in its scope. * share/txr/stdlib/compiler.tl (compiler (comp-fun-form, comp-apply-call)): Pass the function symbol as an extra argument to comp-fun-form so that it's added to ffuns. (compiler comp-call-impl): Take new optional argument: a symbol to be added to the ffuns slot of the returned fragment, indicating that a function symbol is referenced.
*	doc: redocument UTF-8 in source and literals.	Kaz Kylheku	2021-04-10	1	-32/+48
\| \| \| \| \| \| \| \| \| \| \|	* txr.1: Because invalid UTF-8 bytes are allowed in string literals, that documentation needs to be updated. I'm rewriting it substantially to clarify the difference between text streams and parsing source. In the discussion of escape sequences in string literals, the wording is improved. Because the source code is UTF-8, we could plausibly support escapes which specify bytes (that are then decoded), so that's not the correct rationale for not supporting it.
*	doc: remove some hyphenation.	Kaz Kylheku	2021-04-09	1	-10/+10
\| \| \| \| \|	* txr.1: Do not hyphenate two's complement and C language, except in phrases like "C-language-style whatever".
*	doc: lambda: add pointers to alternative notations.	Kaz Kylheku	2021-04-09	1	-0/+75
\| \| \| \| \| \|	* txr.1: Under the lambda operator, point to the op notational family, functional combinators and the OOP-related method and slot indirection.
*	doc: more details in string literals section.	Kaz Kylheku	2021-04-09	1	-0/+14
\| \| \| \| \|	* txr.1: advise user that numeric escapes in string literals are not byte-wise, but specify code points.
*	doc: big patch: hyphenation, wording, formatting.	Paul A. Patience	2021-04-09	1	-473/+573
\| \| \| \| \|	* txr.1: Numerous issues of hyphenation, formatting, and errors in typography and formatting are addressed.
*	parser: allow non-UTF-8 bytes in literals and regexes.	Kaz Kylheku	2021-04-08	3	-2992/+2945
\| \| \| \| \| \| \| \| \| \|	* parser.l (grammar): Just like we do in SREGEX, allow an arbitrary byte in REGEX, mapping it to the DCxx range. Do the same inside string literals of all types. * lex.yy.c.shipped: Updated. * tests/012/parse.tl: New tests.
*	parser: check in .shipped materials.	Kaz Kylheku	2021-04-08	2	-1768/+1835
\| \| \| \| \| \| \| \| \| \| \|	This picks up the changes introduced by the previous three commits. * lex.yy.c.shipped: Updated. * y.tab.c.shipped: Likewise. * y.tab.h.shipped: Likewise.
*	parser: allow funny UTF-8 in regexes and literals.	Kaz Kylheku	2021-04-08	4	-7/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The main idea in this commit is to change a behavior of the lexer, and take advantage of it in the parser. Currently, the lexer recognizes a {UANYN} pattern in two places. That pattern matches a UTF-8 character. The lexeme is passed to the decoder, which is expected to produce exactly one wide character. If the UTF-8 is bad (for instance, a code in the surrogate pair range U+DCxx) then the decoder will produce multiple characters. In that case, these rules return ERRTOK instead of a LITCHAR or REGCHAR. The idea is: why don't we just return those characters as a TEXT token? Then we can just incorporate that into the literal or regex. * parser.l (grammar): If a UANYN lexeme decodes to multiple characters instead of the expected one, then produce a TEXT token instead of complaining about invalid UTF-8 bytes. * parser.y (regterm): Recognize a TEXT item as a regterm, converting its string value to a compound node in the regex AST, so it will be correctly treated as a fixed pattern. (chrlit): If a hash-backslash is followed by a TEXT token, which can happen now, that is invalid; we diagnose that as invalid UTF-8. (quasi_item): Remove TEXT rule, because the litchars constituent not generates TEXT. (litchars, restlistchar): Recognize TEXT item, similarly to regterm. * tests/012/parse.tl: New file. * tests/012/parse.expected: Likewise.
*	parser: fix few memory leaks in error recovery.	Kaz Kylheku	2021-04-08	1	-0/+4
\| \| \| \| \| \| \| \| \|	* parser.y (var, o_var): In a few error productions in which we have a SYMTOK item, we should free the lexeme. This doesn't solve all leaks: any time we have a parser stack containing SYMTOK or TEXT items that belong to rules that have not yet been reduced, and the parse job is aborted due to errors, we leak those.
*	parser: fix poor diagnosis of \x invalid escape.	Kaz Kylheku	2021-04-08	1	-1/+12
\| \| \| \| \| \| \|	* parser.l (grammar): Because the \x pattern requires one or more digits after it, if they are not present, we simply report \x as an an unrecognized escape. It's better if we diagnose it properly as a \x that is not followed by digits.
*	build: calm restless yacc.	Kaz Kylheku	2021-04-08	1	-6/+1
\| \| \| \| \| \| \| \| \| \|	* Makefile (%.tab.c %.tab.h): Remove the trick of keeping the old y.tab.h file if it has not changed. This was once a good idea, but now that we have a proper grouped targets pattern rule which knows that y.tab.h depends on and is produced from parser.y, the trick causes y.tab.h to be perpetually out of date due to its old time stamp, and so yacc is run on every build.
*	doc: bad syntax under doc function.	Kaz Kylheku	2021-04-08	1	-1/+1
\| \| \| \|	* txr.1: Fix formatting.
*	Version 256txr-256	Kaz Kylheku	2021-04-07	6	-965/+1023
\| \| \| \| \| \| \| \| \| \|	* RELNOTES: Updated. * configure, txr.1: Bumped version and date. * share/txr/stdlib/ver.tl: Bumped. * txr.vim, tl.vim: Regenerated.
*	doc: support doc function on android.	Kaz Kylheku	2021-04-07	1	-2/+2
\| \| \| \| \|	* share/txr/stdlib/doc-lookup.tl (open-url): Define for android, which has xdg-open in the termux environment.
*	utf8: fix backtracking bugs in buffer decoder.	Kaz Kylheku	2021-04-07	2	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* utf8.c (utf8_from_buffer): Fix incorrect backtracking logic for handling bad UTF-8 bytes. Firstly, we are not backtracking to the correct byte. Because src is incremented at the top of the loop, the backtrack pointer must be set to src - 1 to point to the possibly bad byte. Secondly, when we backtrack, we are neglecting to rewinding nbytes! Thus after backtracking, we will not scan the entire input. Let's avoid using nbytes, and guard the loop based on whether we hit the end of the buffer; then we don't have any nbytes state to backtrack. * tests/017/ffi-misc.tl: New test case converting a three-byte UTF-8 encoding of U+DC01: an invalid character in the surrogate range. We test that the buffer decoder turns this into three characters, exactly like the stream decoder. Another test case for invalid bytes following a valid sequence start.
*	awk: bugfix: string rs must not compile as regex.	Kaz Kylheku	2021-04-07	1	-5/+5
\| \| \| \| \| \| \| \|	* share/txr/stdlib/awk.tl (awk-state loop): When rs contains a string, do not pass it directly to regex-compile, because that function calls regex-parse when the argument is a string. Wrap it it a (compound ...) tree node to get it to be treated as sequence of characters to match.
*	gc: fix astonishing bug in weak hash processing.	Kaz Kylheku	2021-04-06	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a flaw that has been in the code since the initial implementation in 2009. Weak hash tables are only partially marked during the initial garbage collection marking phase. They are put into a global list, which is then walked again to do the weak processing: to expire items which are not reachable, and then finish walking the table objects. Problem is, the code assumes that this late processing will not discover more hash tables and put them into that global list. This creates a problem when weak hash table contain weak hash tables, such as in the important and very common case when a global variable (binding stored in a weak hash table) contains a weak hash table! These hash tables discovered during weak hash table processing are partially marked, and left that way. The result is that their table vectors get prematurely scavenged by the garbage collector, and then fall victim to use-after-free crashing. Note: do_iters doesn't have this bug. Though the reachable_iters list resembles reachable_weak_hashes, the key difference is that do_iters does not do any marking, and so will not discover any more reachable objects. All it does is update some counts in the hashes to which the still-reachable iterators point. * hash.c (do_weak_tables): Clear the reachable_weak_hashes list on entry into the function, taking a local copy of its head. After walking the list, check the global variable again; it if has become non-null, it means more weak tables were discovered and added to the list. In that case, make a recursive call (susceptible to tail call treatment) to process the list again.
*	qref: bugfix: handle a.(b).?c correctly.	Kaz Kylheku	2021-04-05	1	-1/+1
\| \| \| \| \|	* share/txr/stdlib/struct.tl (qref): Do not assume that (b) is the name of a slot to be looked up. Use qref to handle it.
*	struct: fix lack of hygiene in null-safe qref.	Kaz Kylheku	2021-04-05	1	-1/+3
\| \| \| \| \| \| \| \| \|	The expression a.?b is not being treated hygienically; a is evaluated twice. This is only if the null-safe object is the left most; a.b.?c is hygienic. * share/txr/stdlib/struct.tl (qref): Add the necessary gensym use to fix the broken case.
*	doc: document null-safe method call.	Kaz Kylheku	2021-04-05	1	-4/+26
\| \| \| \| \|	* txr.1: The notation obj.?(fun ...) exists, but is not documented. Let's fix that.
*	compiler: remove optional param from lookup-var.	Kaz Kylheku	2021-04-05	1	-5/+3
\| \| \| \| \| \|	* share/txr/stdlib/compiler.tl (struct env): The mark-used optional parameter of lookup-var is not used anywhere, and so always nil. Let's remove it.
*	INSTALL: revise outdated text, add cross-compiling advice.	Kaz Kylheku	2021-04-04	1	-5/+40
\| \| \| \| \| \| \|	* INSTALL: Mention the parallel debug and optimized build capability of txr: no need to have two separate directories for that. New section on handling the .tl files in cross-compilation, when the txr executable isn't native.
*	doc: remove superfluous words.	Kaz Kylheku	2021-04-04	1	-1/+1
\| \| \| \| \|	* txr.1: under "File-Wide Insertion of Gensyms", remove superfluous verb phrase from sentence.
*	doc: vice versa formatting.	Kaz Kylheku	2021-04-04	1	-1/+1
\| \| \| \| \|	* txr.1: Under "Treatment of Literals", fix lack of close double quote in italicization of vice versa.
*	doc: clarify definition of top-level form.	Kaz Kylheku	2021-04-04	1	-3/+6
\| \| \| \| \| \| \|	* txr.1: In the definition of what is a top-level form to the compiler, replace poor wording about macro-expansion in rule 6, and add a rule which makes it clear that the rules are recursive.
*	doc: note about environment handling in compile.	Kaz Kylheku	2021-04-04	1	-1/+11
\| \| \| \| \| \|	* txr.1: Add notes about environment handling when an interpreted function is compiled, and how hlet/hlet* can be used to obtain sharing.
*	doc: fix missing item periods.	Kaz Kylheku	2021-04-04	1	-20/+20
\| \| \| \|	* txr.1: All missing item number periods added.
*	doc: double word in awk intro.	Kaz Kylheku	2021-04-04	1	-1/+1
\| \| \| \|	* txr.1: Fix "implement implement".
*	awk: relax restriction on :name.	Kaz Kylheku	2021-04-04	2	-10/+9
\| \| \| \| \| \| \| \|	* share/txr/stdlib/awk.tl (sys:awk-expander): Do not impose stricter restrictions on :name than the block mechanism itself. * txr.1: Documentation updated.
*	doc: block names need not be symbols.	Kaz Kylheku	2021-04-04	1	-1/+7
\| \| \| \| \|	* txr.1: The block implementation doesn't care whether blocks are symbols; anything comparable with eq may be used.
*	func-optparam-count: bugfix.	Kaz Kylheku	2021-04-03	1	-1/+1
\| \| \| \| \| \| \|	* lib.c (get_param_counts): If there are no optional parameters, then the oa variable stays negative; we must turn that into a zero, otherwise we return the bogus value -1 as the number of optional arguments.
*	lib: new function for documentation lookup.	Kaz Kylheku	2021-04-03	5	-1/+2142
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* genman.txr: dump contents of symhash into a doc-syms.tl library file, as a defvarl form. * lisplib.c (doc_instantiate, doc_set_entries): New static functions. (lisplib_init): Register autoload for doc-lookup module to symbols doc and doc-url. * share/txr/stdlib/doc-lookup.tl: New file. * share/txr/stdlib/doc-syms.tl: Likewise. * txr.1: Documented.
*	doc: dialect note capitalization.	Kaz Kylheku	2021-03-31	1	-8/+8
\| \| \| \|	* txr.1: Consistently capitalize Dialect Note
*	doc: PP fixes.	Kaz Kylheku	2021-03-31	1	-4/+0
\| \| \| \| \|	* txr.1: Remove two unnecessary .PP directives and a blank line before one.
*	doc: formatting of notes under circle, erase notation.	Kaz Kylheku	2021-03-31	1	-3/+7
\| \| \| \| \| \|	* txr.1: Don't use TP* for notes and dialect notes because it doesn't fit these paragraphs that don't have an indented margin.
*	doc: bad indenation under if directive.	Kaz Kylheku	2021-03-31	1	-1/+1
\| \| \| \|	* txr.1: Add .PP to deindent after example.
*	doc: fix wording under --lisp	Kaz Kylheku	2021-03-31	1	-2/+2
\| \| \| \| \|	* txr.1: Fix grammar problem and wording for --lisp and --compiled.
*	doc: split up -l or --lisp-bindings	Kaz Kylheku	2021-03-31	1	-1/+2
\| \| \| \| \|	* txr.1: Give the two -l and --lisp-bindings synonyms in the same way was other synonyms, as two separate .IP items.
*	doc: style items better, without grid style.	Kaz Kylheku	2021-03-31	1	-6/+11
\| \| \| \| \| \|	* genman.txr: Use an alternative solution for dl.items elemens which places short items to the left of their definining text, while allowing long items to overhang.
*	doc: blank lines after IP sections.	Kaz Kylheku	2021-03-30	2	-21/+10
\| \| \| \| \| \| \| \|	* checkman.txr (check-ip): New pattern function for checking for IP, coIP and meIP macros followed by blank line. This causes a formatting issue in HTML. * txr.1: Fix numerous instances of problem caught by check-ip.
*	doc: missing RS/RE.	Kaz Kylheku	2021-03-30	1	-0/+2
\| \| \| \|	* txr.1: add .RS/.RE pair in Quote and Quasiquote.
*	doc: add grid styling to itemized lists.	Kaz Kylheku	2021-03-30	1	-1/+14
\| \| \| \| \|	* genman.txr: add CSS rules targeting <dl class="items">, which are now supported in man2html.
*	doc: incorrect synopsis of push.	Kaz Kylheku	2021-03-30	1	-4/+5
\| \| \| \| \| \|	* txr.1: Under the summary of place-mutating operations, rewrite the description of push which falsely claims that the pushed item is returned.
*	compiler: incorrect self-check in spy framework.	Kaz Kylheku	2021-03-30	1	-2/+2
\| \| \| \| \| \|	* share/txr/stdlib/compiler.tl (compiler (pop-closure-spy, pop-access-spy)): The stack underflow checkt must be done by checking top, not the incoming spy argument.
*	doc: copy and paste of :wrap under window-map	Kaz Kylheku	2021-03-30	1	-1/+1
\| \| \| \|	* txr.1: Fix about :reflect wrongly referring to :wrap.
*	doc: fix under stream indentation	Kaz Kylheku	2021-03-30	1	-1/+1
\| \| \| \|	* txr.1: indent-foff misspelled as intent-foff.
*	doc: numerous grammar fixes.	Paul A. Patience	2021-03-28	1	-21/+25
\| \| \| \| \|	* txr.1: Fix grammar, punctuation, formatting, and cases of misspellings landing on dictionary words.