txr - TXR: A data munging language.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Copyright year bump 2018.	Kaz Kylheku	2018-02-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* LICENSE, LICENSE-CYG, METALICENSE, Makefile, args.c, args.h, arith.c, arith.h, buf.c, buf.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, protsym.c, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/doloop.tl, share/txr/stdlib/error.tl, share/txr/stdlib/except.tl, share/txr/stdlib/ffi.tl, share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/keyparams.tl, share/txr/stdlib/op.tl, share/txr/stdlib/package.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/pmac.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/stream-wrap.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, win/cleansvg.txr: Extended Copyright line to 2018.
*	cleanup: remove unnecessary header includes.	Kaz Kylheku	2017-09-19	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	* eval.c: doesn't need rand.h. * filter.c: doesn't need gc.h. * parser.l: doesn't need eval.h. * parser.y: doesn't need utf8.h, stream.h, args.h or cadr.h.
*	parser: fix precedence of DOTDOT.	Kaz Kylheku	2017-09-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem is that a.b .. c.d parses as (qref a b..c d), which is useless and counterintuitive. Let's fix it, but with a backward compatibility switch to give more leeway to any hapless people out there whose code happens to depend on this unfortunate situation. We basically use two token numbers for the .. token: OLD_DOTDOT, and DOTDOT. Both are wired into the grammar. In backward compatibility mode, the lexer pumps out OLD_DOTDOT. Otherwise DOTDOT. * parser.l (grammar): When .. is scanned, return OLD_DOTDOT when in compatibility with 185 or earlier. Otherwise DOTDOT. * parser.y (OLD_DOTDOT): New terminal symbol; introduced at the same high precedence previously occupied by DOTDOT. (DOTDOT): Changes precedence to lower than '.' and UREFDOT. (n_expr): Two productions added involving OLD_DOTDOT. These are copy and paste of the existing productions involving DOTDOT; the only difference is that OLD_DOTDOT replaces DOTDOT. (yybadtoken): Handle OLD_DOTDOT. * txr.1: Compat notes added.
*	parser: bugfix: set line number on <lineno> tokens.	Kaz Kylheku	2017-05-18	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This issue was revealed as a garbage line number in an unbound variable warning diagnostic, where the variable occurs in a quasi word list literal. A small test case is (list #`@var`) where var unbound. The fix is, in the lexer, to set the yylval->lineno for all tokens which are declared as <lineno> in the grammar file, for which doing so has beens neglected. We do this even for those tokens whose line number values are never accessd in any rule; it could arise in the future. * parser.l (grammar): Set the yylval->lineno for the tokens HASH_BACKSLASH, HASH_B_QUOTE, HASH_SLASH, WORDS, WSPLICE, QWORDS and QWSPLICE.
*	Continuing implementation of buffers.	Kaz Kylheku	2017-04-21	1	-1/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Makefile (OBJS): New objects itypes.o and buf.o. * buf.c, buf.h: New files. * itypes.c, itypes.h: New files. * lib.c (obj_print_impl): Handle BUF via buf_print and buf_pprint. (init): Call itypes_init and buf_init. * parser.h (end_of_buflit): Declared. * parser.l (BUFLIT): New exclusive state. (grammar): New rules for recognizing start of buffer literal and its interior. (end_of_buflit): New function. * parser.y (HASH_B_QUOTE): New token. (buflit, buflit_items, buflit_item): New nonterminals and corresponding grammar rules. (i_expr, n_expr): These symbols now generate a buflit; a buffer literal is a kind of expression. (yybadtoken): Handle HASH_B_QUOTE case.
*	parser: C++ regression.	Kaz Kylheku	2017-04-04	1	-30/+30
\| \| \| \| \| \| \|	* parser.l (grammar): Pass yyg to directive_tok rather than yyscanner. It has the yyguts_t * type, whereas yyscanner is a void * version of the same pointer.
*	parser: bugfix: don't scan @NUM in QSPECIAL state.	Kaz Kylheku	2017-04-04	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem is syntax like `@@12a` being scanned as if it were `@{@12}a` rather than @{@12a}`. When the scanner is in the middle of a quasiliteral, in the QSILIT state and sees a @, it transitions to the QSPECIAL state. In the QSPECIAL state, the METANUM token syntax is recognized consisting of @ followed by a decimal, octal or hex number. In the same QSPECIAL state, however, a meta-variable like @abc is not recognized as a unit; rather, a @ is recognized by itself, and abc by itself. Thus when @12a is seen in the QSPECIAL state, the @12 is the longest match. The fix is to treat METANUM tokens the same way in the QSPECIAL state: just recognize a number without the @ prefix, and report as a METANUM. * parser.l (grammar): Split the pattern in all four METANUM rules so that in the NESTED, BRACED, QSLIT and QWLIT states, the number is recognized together with the @ prefix. But in the QSPECIAL state, indicating that one or more @ characters have been seen, just recognize a number without the prefix as a METANUM.
*	parser: do not reject 0.1..0.2 range.	Kaz Kylheku	2017-04-02	1	-2/+1
\| \| \| \| \|	* parser.l: Remove the pattern match which causes 0.1..0 to be rejected.
*	parser: diagnose syntax like 0.1.2 and .1.1.	Kaz Kylheku	2017-04-02	1	-3/+3
\| \| \| \| \| \| \| \| \|	Currently (list .1.1) yields (0.1 0.1). This is evading the rule for catching cramped floating-point literals. * parser.l (grammar): Carefully weaken the pattern match in the relevant rule for catching cramped floating-point literals, so it matches these cases.
*	Bugfix: .1 treated as dot if preceded by space.	Kaz Kylheku	2017-04-02	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Some recent work in supporting .slot syntax (uref dot) broke the treatment of floating point literals. This is because part of the trick is that a uref dot is recognized with leading whitespace as part of the token. But that of course means it steals the match for some floating-point tokens; oops! * parser.l (grammar): All rules for floating-point tokens which can match a leading decimal point now munch optional whitespace first.
*	Package prefix handling on directive symbols.	Kaz Kylheku	2017-03-27	1	-30/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The directives which are involved in special phrase structure syntax like @(collect), @(end), @(and) and many others have always been a hack, recognized specially in the lexical analyzer and handled in the parser. The identifiers were not treated via the normal Lisp interning mechanism. In this patch, we try to make the illusion more complete and functional. Going forward, these symbols are understood as being interned in the usr package. As a special relaxation, keyword symbols may be used in their place, so that @(:end) is the same as @(end) and @(:collect) is the same as @(collect). Suppose that @(collect) is scanned, but the collect symbol interned in the current package isn't usr:collect, or keyword:collect. Then this is an error. Further, package prefixes may be used. The syntax @(abc:collect) is still valid and is still recognized as the head of the @(collect) phrase structure syntax. However, if abc:collect isn't the same symbol as either usr:collect or :collect, then an error is triggered. * parser.l (grammar): Recognize optional package prefixes on directive phrase structure identifiers. (directive_tok): Extract package prefix and symbol from lexeme. Implement the above described checks for all the cases. * txr.1: Added description of this under the Packages and Symbols section.
*	Lexer refactoring: special syntax tokens.	Kaz Kylheku	2017-03-27	1	-90/+43
\| \| \| \| \| \|	* parser.l (directive_tok): New static function. (grammar): Replace repeated code with calls to directive_tok.
*	uref: the a.b.c syntax extended to .a.b.c	Kaz Kylheku	2017-03-06	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now it is possible to use a leading dot on the referencing dot syntax. This is the is the "unbound reference dot". It expands to the uref macro, which denotes an unbound-reference: it produces a function which takes an object as the argument, and curries the reference implied by the remaining arguments. * eval.c (uref_s): New global symbol variable. (eval_init): Intern uref symbol and init uref_s. * eval.h (uref_s): Declared. * lib.c (simple_qref_args_p): A qref expression is now also not simple if it contains an embedded uref, meaning that it cannot be rendered into the dot notation without ambiguity. (obj_print_impl): Support printing (uref a b c) as .a.b.c. * lisplib.c (struct_set_entries): Add uref to the list of autoload triggers for struct.tl. * parser.l (DOTDOT): Consume any leading whitespace as part of recognizing the DOTDOT token. Otherwise the new rule for UREFDOT, which matches (mandatory) leading space will take precedence, causing " .." to be scanned wrong. (UREFDOT): Rule for new kind of dot token, which is preceded by mandatory whitespace, and isn't consing dot (which has mandatory trailing whitespace too, matched by an earlier rule). * parser.y (UREFDOT): New token type. (i_dot_expr, n_dot_expr): New grammar rules. (list): Handle a leading dot on the first element of a list as a special case. Things are done this way because trying to work a UREFDOT into the grammar otherwise causes intractable conflicts. (i_expr): The ^, ' and , punctuators are now followed by an i_dot_expr, so that the expression can be an unbound dot. (n_expr): Same change as in i_expr, but using n_dot_expr. Plus new UREFDOT n_expr production. * share/txr/stdlib/struct.tl (uref): New macro. * txr.1: Documented.
*	parser: diagnose run-on symbols.	Kaz Kylheku	2017-02-01	1	-0/+14
\| \| \| \| \| \| \| \| \| \|	* parser.l (grammar): Add rules which capture two symbols glued together, and diagnose as bad token. Of course a legitimate symbol token can be divided into two that are glued together. This rule is placed after the legitimate symbol matching rule, so that if a token can be interpreted as a single symbol token or as two, the first interpretation is taken.
*	parser: diagnose more kinds of junk after float.	Kaz Kylheku	2017-02-01	1	-1/+2
\| \| \| \| \| \| \| \| \|	* parser.l (grammar): Add a rule that if a floating-point (of the type that ends in decimal digits with an optional exponent) is immediately followed by a period which is not followed by another period (range syntax), it is trailing junk. For instance 1.0.3 or .2.$, or 1.0. followed by no other input.
*	Bump copyright year to 2017.	Kaz Kylheku	2017-01-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* LICENSE, LICENSE-CYG, METALICENSE, Makefile, args.c, args.h, arith.c, arith.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, stream.c, stream.h, struct.c, struct.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/except.tl, share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/package.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl: Add 2017 to all copyright headers and strings.
*	Fix some C style casts to use casting macros.	Kaz Kylheku	2016-12-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is uncovered by compiling with g++ using -Wold-style-cast. * mpi/mpi.c (mp_get_intptr): Use convert macro. Also in one of the rules producing REGCHAR. * parser.l (num_esc): Likewise. * struct.c (static_slot_set, static_slot_ens_rec, get_equal_method): Use coerce macro for int to pointer conversion. * sysif.c (setgroups_wrap): Use convert macro. * termios.c (termios_unpack, termios_pack): Likewise. * txr.c (sysroot_init): Likewise.
*	Removes stray debug printf from lexer.	Kaz Kylheku	2016-12-04	1	-1/+0
\| \| \| \| \| \|	* parser.l: A stray printf was committed in November 2015. The spurious output only occurs when certain invalid floating-point syntax is encountered.
*	Harden processing of character escapes.	Kaz Kylheku	2016-12-02	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Weakness uncovered by fuzzing with AFL (fast) 2.30b. The failing test case is regex syntax like [\1111111...111abc], where the bad character escape allows an invalid, negatively valued character object to escape out of the parser into the system leading to an an out-of-bounds array access in the char set code in the regex compiler. * parser.l (num_esc): Make sure that an out-of-range character is mapped to zero. Set up a default value of zero for the return variable. If the character token has too many digits, don't pass them through strtol at all, which will produce a garbage value. Then in the final range check, actually replace the value with zero if it is out of range: issuing a diagnostic is not enough.
*	Support #: reading for uninterned symbols.	Kaz Kylheku	2016-11-07	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* parser.l (BTKEY, NTKEY): Renamed to BTKWUN and NTKWUN ("keyword and uninterned") respectively. Include an optional match for the # character. (BTOK, NTOK): Refer to BTKEY and NTKEY respectively * parser.y (sym_helper): Implement uninterned symbols by detecting when the package name string is "#" and handling specially. * txr.1: Documented package prefixes and uninterned symbols.
*	New #; syntax for erasing following object.	Kaz Kylheku	2016-11-07	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* parser.c (parser_circ_ref): Don't generate the circular reference if circular suppression is in effect. * parser.h (struct parser): New member, circ_suppress. We use this for suppressing the generation of circular #n# references in erased objects. * parser.l (grammar): Scan #; producing HASH_SEMI token. * parser.y (HASH_SEMI): New token. (hash_semis_n_expr, hash_semis_i_expr, ignored_i_exprs, ignored_n_exprs): New nonterminals, needed for supporting the use of #; in front of top-level forms. (spec): Use hash_semis_n_expr and hash_semis_i_expr instead of n_expr and i_expr. (r_expr): Support object erasure within nested syntax. (yybadtoken): Handle H_SEMI token. (parse): Initialize new circ_suppress member of parser struct to zero. * txr.1: Documented. * genvim.txr (txr_ign_par, txr_ign_bkt, txr_ign_par_interior, txr_ign_bkt_interior): New regions for colorizing erased objects (partial support). (txr_list, txr_bracket, txr_mlist, txr_mbrackets): Include erased objects by including regions txr_ign_par and txr_ign_bkt. * txr.vim, tl.vim: Regenerated.
*	Adding notation for cycles and shared structure.	Kaz Kylheku	2016-10-18	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit implements the parse-side support for handling a notation that exists in ANSI Common Lisp for specifying objects with cycles and shared substructure. * parser.h (struct parser): New members, circ_ref_hash and circ_count. (circref_s, parser_resolve_circ, parser_circ_def, parser_circ_ref): Declared. * parser.c (circref_s): New symbol variable. (parser_mark): Visit the new circ_ref_hash member of the parser structure. (parser_common_init): Initialize new members circ_ref_hash and circ_count of parser structure. (patch_ref, circ_backpatch): New static functions. (parser_resolve_circ, parser_circ_def, parser_circ_ref): New functions. (circref): New static function. (parse_init): Initialize circref_s as sys:circref symbol. Register sys:circref function. * parser.l (grammar): Scan #<num>= and #<num># notation as tokens, extracting their numeric value. * parser.y (HASH_N_EQUALS, HASH_N_HASH): New token types. (i_expr, n_expr): Adding phrases for hash-equalsign and hash-hash syntax. (yybadtoken): Handle new token types in switch. (parse_once): Call parser_resolve_circ after parsing to rewrite any remaining #<num># references in the structure to the objects they denote. (parse): Reset new struct parse members to initial state. Call parser_resolve_circ after parsing to rewrite any remaining #<num># references.
*	Synchronize license comments with LICENSE.	Kaz Kylheku	2016-10-01	1	-16/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Makefile, args.c, args.h, arith.c, arith.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/except.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Revert to verbatim 2-Clause BSD.
*	Allow whitespace between @ and ; in comments.	Kaz Kylheku	2016-05-23	1	-2/+2
\| \| \| \| \| \| \|	* parser.l (grammar): Recognize {WS}* between @ and ; (or the legacy #) in comments. * txr.1: Documentation updated.
*	Handle non-UTF-8 byte in regex scanned from string.	Kaz Kylheku	2016-04-21	1	-0/+6
\| \| \| \| \| \| \| \|	The current behavior is that there is no lex rule for this, so such a byte gets echoed. parser.l (grammar): Add fallback rule to match one byte in SREGEX state and turn it into 0xDCxx character.
*	Better job of diagnosing out-of-range char escapes.	Kaz Kylheku	2016-04-21	1	-2/+9
\| \| \| \| \| \|	* parser.l (num_esc): Check for converted value being out of the range of wchar_t or beyond 0x10FFFF, whichever is less.
*	Bugfix: allow newline in regex parsing from string.	Kaz Kylheku	2016-04-18	1	-1/+7
\| \| \| \| \| \| \|	* parser.l (grammar): The newline character is incorrectly handled by the same rule under the SREGEX and REGEX states. In the SREGEX state, just return it as a REGCHAR, not forgetting to increment the line number.
*	Trailing whitespace.	Kaz Kylheku	2016-04-18	1	-1/+1
\| \| \| \|	* parser.l: Remove trailing whitespace.
*	Revamp bad character messages in lexer.	Kaz Kylheku	2016-04-01	1	-4/+15
\| \| \| \| \| \| \| \|	* parser.l (grammar): Drop colon from unrecognized escape message. "bad character in directive" handles various cases to avoid printing junk to the terminal. Basic message harmonizes with the one in the yybadtoken function in the parser. Non-UTF-8 byte printed as TXR hex integer literal.
*	gc bug: prepared_msg field of struct parser.	Kaz Kylheku	2016-03-07	1	-1/+2
\| \| \| \| \| \| \|	* parser.l (yyerrprepf): Replace wrong bare assignment to parser->prepared_msg with proper set macro which handles the mutation of a mature generation object such that it points to a baby object.
*	New :mandatory keyword in until/last clauses.	Kaz Kylheku	2016-01-15	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* match.c (mandatory_k): New keyword variable. (h_coll, v_gather, v_collect): Implement :mandatory logic. (syms_init): Initialize mandatory_k. * parser.l (grammar): The UNTIL and LAST tokens must be matched similarly to collect, without consuming the closing parenthesis, allowing a list of items to be parsed between the symbol and the closure, in the NESTED state. * parser.y (gather_clause, collect_clause, elem, repeat_parts_opt, rep_parts_opt): Adjust to new until/last syntax. In the matching productions, the abstract syntax changes to incorporate the options. In the output productions, we throw an error if options are present. * txr.1: Documented :mandatory for collect, coll and gather.
*	Copyright year bump.	Kaz Kylheku	2015-12-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* LICENSE, METALICENSE, Makefile, args.c, args.h, arith.c, arith.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/cadr.tl, share/txr/stdlib/except.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, stream.c, stream.h, struct.c, struct.h, sysif.c, sysif.h, syslog.c, syslog.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Add 2016 copyright. * linenoise/LICENSE, linenoise/linenoise.c, linenoise/linenoise.h: Bump one principal author's copyright from 2014 to 2015. The code is based on a snapshot of 2015 upstream work.
*	Implementing print-base and ~d format directive.	Kaz Kylheku	2015-11-14	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* debug.c (show_bindings): Use ~d for level, so as not to be influenced by print-base. (debug): Use ~d for line numbers. * lib.c (gensym): Use ~d conversion specifier for formatting gensym counter into symbol name. * match.c (LOG_MISMATCH, LOG_MATCH): Use ~d for line number references. (h_skip, h_coll, h_fun, h_chr, match_line_completely, v_skip, v_fuzz, v_gather, v_collect, v_output, v_filter, v_fun, v_assert, v_load, v_line, h_assert, open_data_source): Use ~d for line refs, number of iterations, errno values. * parser.c (repl): Use ~d for prompt line numbers, numbered variables and the expr-<n> string in error messages. * parser.l (yyerrorf, source_loc_str): Use ~d for line numbers. * stream.c (print_base_s): New symbol variable. (formatv): Implement print-base. (stdio_maybe_read_error, stdio_maybe_error, stdio_close, pipe_close, open_directory, open_file, open_fileno, open_tail, open_process, run, remove_path): Use ~d for errno values. (stream_init): Initialize print_base_s and register print-base special variable. sysif.c (mkdir_wrap, ensure_dir, getcwd_wrap, mknod_wrap, chmod_wrap, symlink_wrap, link_wrap, readlink_wrap, excec_wrap, stat_impl, pipe_wrap, poll_wrap, getgroups_wrap, setuid_wrap, seteuid_wrap, setgid_wrap): Use ~d for errno values and system function results. * txr.1: Documented print-base and ~d conversion specifier.
*	New iread function.	Kaz Kylheku	2015-11-07	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The read function no longer works like it used to on an interactive terminal because of the support for .. and . syntax on a top-level expression. The iread function is provided which uses a modified syntax that doesn't support these operators on a top-level expression. The parser thus doesn't look one token ahead, and so iread can return immediately. * eval.c (eval_init): Register iread intrinsic function. * parser.c (prime_parser): Only push back the recently seen token when priming for a regular Lisp read. Handle the prime_interactive method by preparing a SECRET_ESCAPE_I token. (lisp_parse_impl): New static function, formed from previous lisp_parse. Takes a boolean argument indicating interactive mode. (prime_parser_post): New function. (lisp_parse): Now a wrapper for lisp_parse_impl which passes a nil to indicate noninteractive read. (iread): New function. * parser.h (enum prime_parser): New member, prime_interactive. (scrub_scanner, iread, prime_parser_post): Declared. * parser.l (prime_scanner): Handle the prime_interactive case the same way as prime_lisp. (scrub_scanner): New function. * parser.y (SECRET_ESCAPE_I): New token type. (i_expr): New nonterminal symbol. Like n_expr, but doesn't support dot or dotdot operators, except in nested subexpressions. (spec): Handle SECRET_ESCAPE_I by way of i_expr. (sym_helper): Before freeing the token lexeme, call scrub_scanner. If the token is registered as the scanner's most recently seen token, the scanner must forget that registration, because it is no longer valid. (parse): Call prime_parser_post. * txr.1: Documented iread.
*	New range type, distinct from cons cell.	Kaz Kylheku	2015-11-01	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* eval.c (eval_init): Register intrinsic functions rcons, rangep from and to. (eval_init): Register rangep intrinsic. * gc.c (mark_obj): Traverse RNG objects. (finalize): Handle RNG in switch. * hash.c (equal_hash, eql_hash): Hashing for for RNG objects. * lib.c (range_s, rcons_s): New symbol variables. (code2type): Handle RNG type. (eql, equal): Equality for ranges. (less_tab_init): Table extended to cover RNG. (less): Semantics defined for ranges. (rcons, rangep, from, to): New functions. (obj_init): range_s and rcons_s variables initialized. (obj_print_impl): Produce #R notation for ranges. (generic_funcall, dwim_set): Recognize range objects for indexing * lib.h (enum type): New enum member, RNG. MAXTYPE redefined to RNG value. (TYPE_SHIFT): Increased to 5 since there are now 16 type codes. (struct range): New struct type. (union obj): New member rn, of type struct range. (range_s, rcons_s, rcons, rangep, from, to): Declared. (range_bind): New macro. * parser.l (grammar): New rule for recognizing the #R sequence as HASH_R token. * parser.y (HASH_R): New terminal symbol. (range): New nonterminal symbol. (n_expr): Derives the new range symbol. The n_expr DOTDOT n_expr rule produces rcons expression rather than const. * match.c (format_field): Recognize rcons syntax in fields which is now what ranges translate to. Also recognize range object. * tests/013/maze.tl (neigh): Fix code which destructures range as a cons. That can't be done any more. * txr.1: Document ranges.
*	Better diagnostic for cramped floating literals.	Kaz Kylheku	2015-10-07	1	-2/+7
\| \| \| \| \| \|	* parser.l: Different text needed for ).1 and a.1 cases, because the insertion of a zero cannot fix the latter. Might as well make the messages more detailed.
*	syntax: be tolerant of carriage returns.	Kaz Kylheku	2015-09-16	1	-15/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is needed for multi-line mode with CR line breaks. It also makes TXR tolerant when code is ported among systems with different line endings. * parser.l (NL): New lex named pattern, matching three possible line terminators: CR, NL or CR-NL. (grammar): In places where \n was previously matched, use {NL}. In a few places where \n is in a character class, add \r. In one place (comment matching), the the pattern . which implicitly doesn't match newlines had to be replaced with [^\r\n].
*	Parse errors lose program prefix and parens.	Kaz Kylheku	2015-09-06	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \|	* parser.l (yyerrorf): Don't print the program prefix and parenthes, except if compatibility to 114 or older is requested. The main motivation for this is the repl, where the program prefix is not informative. The new format is also a de facto standard which is compatible with other parsers. Vim understands it directly. * txr.1: Documented.
*	One-liner to allow @{obj.slot} in quasiliterals.	Kaz Kylheku	2015-09-02	1	-1/+1
\| \| \| \| \| \| \| \|	* parser.l (grammar): Recognize '.' token in BRACED state also. * genvim.txr: @{obj.slot ...} syntax highlighting support. Include txr_dot and txr_dotdot in txr_bracevar region.
*	Introducing structs.	Kaz Kylheku	2015-09-02	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* args.c (args_cat_zap): New function. * args.h: (args_cat_zap): Declared. * eval.c (struct_lit_s): New symbol variable. (eval_init): Initialize struct_lit_s. * eval.h (struct_lit_s): Declared. * gc.c (finalize): If a symbol has a struct slot hash attached to it, we must free it when the symbol is reclaimed. * lib.c (make_sym): Initialize symbol's slot_cache pointer to null. (copy): Copy structure objects. (init): Call struct_init to initialize struct module. * lib.h (SLOT_CACHE_SIZE): New preprocessor symbol (slot_cache_line_t, slot_cache_t): New typedefs. (struct sym): New member, slot_cache. * lisplib.c (struct_set_entries, struct_instantiate): New static functions. (liplib_init): Register new functions in dl_table. parser.y (HASH_S): New terminal symbol. (struct): New grammar rule. (n_expr): Derive struct. (yybadtoken): Map HASH_S to #S string. parser.l (grammar): Recognize #S and return HASH_S token. share/txr/stdlib/place.tl (slot): New defplace. share/txr/stdlib/struct.tl: New file. struct.c: New file. struct.h: New file. * Makefile (OBJS): Adding struct.o.
*	Allow slashes in regex passed to regex-parse.	Kaz Kylheku	2015-08-15	1	-16/+15
\| \| \| \| \| \| \| \| \| \| \|	* parser.l (SREGEX): New start state, for stand-alone regex parsing. (grammar): All REGEX state rules are active in the SREGEX state also. The rule for the / character returns a REGCHAR if in the SREGEX state, so it is treated as an ordinary character. * txr.1: Updated regex-parse documentation about the treatment of the slash. Also added notes about double escaping when a string literal is passed to regex-parse.
*	Floating-point constant tightening.	Kaz Kylheku	2015-08-12	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	* parser.l (grammar): Change order of rule which recognizes FLODOT with a one-character trailing context other than a dot, and the rule which diagnoses trailing junk. The issue is that this order gives the wrong interpretation to 123.E, treating it as 123. followed by E rather than trailing junk, like in the case of 123.0E or 123.B. * txr.1: Adding the valid example 1.E5. Removing references to dot as consing dot. Fixed documentation which says that 1.E is 1 followed by a consing dot and E. The wrong behavior in fact produced 1.0 followed by E. No consing dot semantics.
*	Use new pushback token priming for single regex parse.	Kaz Kylheku	2015-08-12	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* parser.h (enum prime_parser): New enum. (prime_parser, prime_scanner, parse): Declarations updated with new argument. * parser.c (prime_parser): New argument of enum prime_parser type Select appropriate secret token for regex and Lisp case. Pass prime selector down to prime_scanner. (regex_parse): Do not prepend secret escape to string. Do not use parse_once function; instead do the parser init and cleanup here and use the parse function. (lisp_parse): Pass new argument to parse, configuring the parser to be primed for Lisp parsing. * parser.l (grammar): Rule producing SECRET_ESCAPE_R removed. (prime_scanner): New argument. Pop the scanner state down to INITIAL. Then unconditionally switch to appopriate state based on priming configuration. * parser.y (parse): New argument for priming selection, passed down to prime parser.
*	Crafting a better parser-priming hack.	Kaz Kylheku	2015-08-12	1	-19/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The method of inserting a character sequence which generates a SECRET_TOKEN_E token is being replaced with a purely token based method. Because we don't manipulate the input stream, the lexer is not involved. We don't have to flush its state and deal with the carry-over of the yy_hold_char. This comes about because recent changes expose a weakness in the old scheme. Now that a top-level expression can have the form expr.expr, it means that the Yacc parser reads one token ahead, to see whether there is a dot or something else. This lookahead token is discarded. We must re-create it when we call yyparse again. This re-creation is done by creating a custom yylex function, which can maintain pushback tokens. We can prime this array of pushback tokens to generate the SECRET_TOKEN_E, as well as to re-inject the lookahead symbol that was thrown away by the previous yyparse. To know which lookahead symbol to re-inject is simple: the scanner just keeps a copy of the most recent token that it returns to the parser. When the parser returns, that token must be the lookahead one. The tokens we keep now in the parser structure are subject to garbage collection, and so we must mark them. Since the YYSTYPE union has no type field, a new API is opened up into the garbage collector to help implement a conservative GC technique. * gc.c (gc_is_heap_obj): New function. * gc.h (gc_is_heap_obj): Declared. * match.c: Include y.tab.h. This is now needed by any module that needs to instantiate a parser_t structure, because members of type YYSTYPE occur in the structure. (parser.h can still be included without y.tab.h, but only an incomplete declaration for the parser strucure is then given, and a few functions are not declared.) * parser.c (yy_tok_mark): New static function. (parser_mark): Mark the recent token and the pushback tokens. (parser_common_init): Initialize the recent token, the pushback tokens, and the pushback stack index. (pushback_token): New static function. (prime_parser): hold_byte argument removed. Body considerably simplified. The catenated stream trick is no longer required. All we do here is set up two pushback tokens and prime the scanner, if necessary, so it is in the right start state for Lisp. * parser.l (YY_DECL): Take over definition of scanning function, renaming to yylex_impl, so we can implement yylex. (grammar): Rule which produces SECRET_ESCAPE_E token removed. (reset_scanner): Function removed. (yylex): New function. * parser.h (struct parser): Now only forward-declared unless y.tab.h has been included. New members, recent_tok, tok_pushback and tok_idx. (yyset_hold_char): Declared. (reset_scanner): Declaration removed. (yylex): Declared (if y.tab.h included). (prime_parser): Declaration updated. (prime_scanner): Declared. * Makefile: express new dependency on existence of y.tab.h of txr.o, match.o and parser.o.
*	Diagnose ambiguous floats like (a b).4 and x.y.5	Kaz Kylheku	2015-08-10	1	-0/+30
\| \| \| \| \| \| \| \|	These look like integers involved in qref dot syntax. * parser.l (DOTFLO): New pattern definition. (grammar): New rules for detecting cramped floating literals.
*	Dot with no whitespace generates qref syntax.	Kaz Kylheku	2015-08-10	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a.b.(expr ...).c -> (qref a b (expr ...) c) Consing dot requires whitespace. * eval.c (qref_s): New symbol global variable. (eval_init): Initialize qref_s. * eval.h (qref_s): Declared. * parser.l (REQWS): New pattern definition, required whitespace. (grammar): New rules to scan CONSDOT (space required on both sides) and LAMBDOT (space required after). * parser.y (CONSDOT, LAMBDOT): New token types. (list): (. n_expr) rule replaced with LAMBDOT and CONSDOT. (r_exprs): r_exprs . n_expr consing dot rule replaced with CONSDOT. (n_expr): New n_expr . n_expr rule introduced here for producing qref expressions. (yybadtoken): Handle CONSDOT and LAMBDOT. * txr.1: Documented qref dot.
*	Handle abc: token syntax.	Kaz Kylheku	2015-08-10	1	-2/+2
\| \| \| \| \| \|	* parser.l (BTREG, NTREG): Allow an empty string symbol name with a nonempty package name. Without this, abc: parses as abc :.
*	* eval.c (force): Default the new second argument of source_loc_str.	Kaz Kylheku	2015-08-04	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(eval_error): Derive location of error from the last_form_evaled, if form doesn't have it. (eval_init): Re-register source-loc-str as binary with an optional arg. * match.c (debuglf, sem_error, file_err, typed_error): Default new argument of source_loc_str. * parser.h (source_loc_str): Declaration updated. * parser.l (source_loc_str): Take second argument which specifies alternative value if the source loc info is not found. * unwind.c (uw_throw): Simplify code thanks to source_loc_str default argument. * txr.1: Document new argument of source-loc-str.
*	* parser.l (grammar): Do not allow unescaped newline in	Kaz Kylheku	2015-07-23	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \|	word list literals and word list quasiliterals, except in <= 109 compatibility mode. An escaped newline in these literals, together with surrounding whitespace, now produces a single space, except in <= 109 compatibility mode. * txr.1: Documented new rules for WLL's and QLL's, and added compatibility notes.
*	Bugfix: lexer loses unmatched "hold char" between top-level forms.	Kaz Kylheku	2015-07-10	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test case: file containing 4(prinl 3). Scanner consumes 4 and (. The ( is lost when the scanner is reset for the next call to yyparse, resulting in jut prinl being read and interpreted as a variable. * parser.c (prime_parser): If present, append hold byte to priming string. Takes parser_t * instead of parser, and returns void now. * parser.l (reset_scanner): Now returns int value, the value of the scanner's yy_hold_char variable which is nonzero when the scanner is hanging on to an unmatched byte of input. * parser.h (reset_scanner, prime_parser): Declarations updated. * parser.y (parse): Pass hold byte returned by reset_scanner to prime_parser.