summaryrefslogtreecommitdiffstats
path: root/parser.y
Commit message (Collapse)AuthorAgeFilesLines
* Copyright year bump 2022.Kaz Kylheku2022-01-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | *LICENSE, LICENSE-CYG, METALICENSE, Makefile, alloca.h, args.c, args.h, arith.c, arith.h, buf.c, buf.h, cadr.c, cadr.h, chksum.c, chksum.h, chksums/crc32.c, chksums/crc32.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lex.yy.c.shipped, lib.c, lib.h, linenoise/linenoise.c, linenoise/linenoise.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, protsym.c, psquare.h, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, socket.c, socket.h, stdlib/arith-each.tl, stdlib/asm.tl, stdlib/awk.tl, stdlib/build.tl, stdlib/cadr.tl, stdlib/compiler.tl, stdlib/constfun.tl, stdlib/conv.tl, stdlib/copy-file.tl, stdlib/debugger.tl, stdlib/defset.tl, stdlib/doloop.tl, stdlib/each-prod.tl, stdlib/error.tl, stdlib/except.tl, stdlib/ffi.tl, stdlib/getopts.tl, stdlib/getput.tl, stdlib/hash.tl, stdlib/ifa.tl, stdlib/keyparams.tl, stdlib/match.tl, stdlib/op.tl, stdlib/optimize.tl, stdlib/package.tl, stdlib/param.tl, stdlib/path-test.tl, stdlib/pic.tl, stdlib/place.tl, stdlib/pmac.tl, stdlib/quips.tl, stdlib/save-exe.tl, stdlib/socket.tl, stdlib/stream-wrap.tl, stdlib/struct.tl, stdlib/tagbody.tl, stdlib/termios.tl, stdlib/trace.tl, stdlib/txr-case.tl, stdlib/type.tl, stdlib/vm-param.tl, stdlib/with-resources.tl, stdlib/with-stream.tl, stdlib/yield.tl, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, time.c, time.h, tree.c, tree.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h, win/cleansvg.txr, y.tab.c.shipped: Copyright year bumped to 2022.
* tree: support for duplicate keys.Kaz Kylheku2021-12-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * tree.c (tr_insert): New argument for allowing duplicate. If it is true, suppresses the case of replacing a node, causing the logic to fall through to traversing right, so the duplicate key effectively looks like it is greater than the existing duplicates, and gets inserted as the rightmost duplicate. (tr_do_delete_specific, tr_delete_specific): New static functions. (tree_insert_node): New parameter, passed to tr_insert. (tree_insert): New parameter, passed to tree_insert_node. (tree_delete_specific_node): New function. (tree): New parameter to allow duplicate keys in the elements sequence. (tree_construct): Pass t to tree to allow duplicate elements. (tree_init): Update registrations of tree, tree-insert and tree-insert-node. Register tree-delete-specific-node function. * tree.h (tree, tree_insert_node, tree_insert): Declarations updated. (tree_delete_specific_node): Declared. * lib.c (seq): Pass t argument to tree_insert, allowing duplicates. * parser.c (circ_backpatch): Likewise. * parser.y (tree): Pass t to new argument of tree, so duplicates are preserved in the element list of the #T literal. * y.tab.c.shipped: Updated. * tests/010/tree.tl: Test cases for duplicate keys. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
* parser: bugfix: #; at front of list.Kaz Kylheku2021-10-231-3/+5
| | | | | | | | | | | | | | | | | | | The parser wrongly reads #(#; abc) as (nil) instead of (), and related cases derived from this one are all likewise wrong. A number of tests added in the previous commit target this and fail. They are hereby fixed. * parser.y (listacc): In the productions that begin with HASH_SEMI, do not produce a (nil . nil) leading cons, but a (nao . nil) leading cons; so the fact that the first item is commented out is represented by a nao in the car field of the leading cons. (n_exprs): If the first element of the list produced by the listacc grammar symbol is nao, then pop it off. Thereby, we lose the spurious nil that we previously had there left by the commented-out item. * y.tab.c.shipped: Updated.
* string-extend: third optional argument.Kaz Kylheku2021-09-071-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A Boolean optional argument to string-extend indicates whether this is likely the last call to string-extend, so memory can be trimmed accordingly. * eval.c (eval_init): Update string-extend registration. * filter.c (trie_filter_string): Pass nil for new argument of string_extend. * lib.c (str_seq, replace_str, lazy_str_force, lazy_str_force_upto): Pass nil for new argument of string_extend. (rem_impl, remove_if, separate): Pass t for new argument of string_extend on last iteration, nil otherwise. (string_extend): Implement new third argument, defaulted to nil. Switch from chk_grow_vec to the more specific chk_wrealloc, which simplifies the code. * lib.h (string_extend): Declaration updated. * parser.y (litchars): Pass t as last argument of string_extend since we know syntactically that these reductions finalize the string. (restlitchar): Pass nil as the last argument of string_extend, since we know syntactically that it isn't the last. * regex.c (scan_until_common): Pass nil for new argument of string_extend. * txr.1: Documented.
* license: reformat to fit 80 columns.Kaz Kylheku2021-08-161-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Makefile, alloca.h, args.c, args.h, arith.c, arith.h, buf.c, buf.h, chksum.c, chksum.h, chksums/crc32.c, chksums/crc32.h, combi.c, combi.h, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, socket.c, socket.h, stdlib/asm.tl, stdlib/awk.tl, stdlib/build.tl, stdlib/compiler.tl, stdlib/constfun.tl, stdlib/conv.tl, stdlib/copy-file.tl, stdlib/debugger.tl, stdlib/defset.tl, stdlib/doloop.tl, stdlib/each-prod.tl, stdlib/error.tl, stdlib/except.tl, stdlib/ffi.tl, stdlib/getopts.tl, stdlib/getput.tl, stdlib/hash.tl, stdlib/ifa.tl, stdlib/keyparams.tl, stdlib/match.tl, stdlib/op.tl, stdlib/optimize.tl, stdlib/package.tl, stdlib/param.tl, stdlib/path-test.tl, stdlib/pic.tl, stdlib/place.tl, stdlib/pmac.tl, stdlib/quips.tl, stdlib/save-exe.tl, stdlib/socket.tl, stdlib/stream-wrap.tl, stdlib/struct.tl, stdlib/tagbody.tl, stdlib/termios.tl, stdlib/trace.tl, stdlib/txr-case.tl, stdlib/type.tl, stdlib/vm-param.tl, stdlib/with-resources.tl, stdlib/with-stream.tl, stdlib/yield.tl, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, time.c, time.h, tree.c, tree.h, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h: License reformatted. * lex.yy.c.shipped, y.tab.c.shipped, y.tab.h.shipped: Updated.
* parser: @(mdo) must not be subject to expand-meta.Kaz Kylheku2021-08-111-4/+3
| | | | | | | | | | | | | | | | | | | | * parser.y (elem): When the elem is a list, if it starts with mdo, then we evaluate it immediately and substitute (do) as the semantic value. We no longer not allow mdo to precipitate into match_expand_elem, where expand_meta will be unleashed on it. Doing so causes the bug that expessions like @1, denoting the form (sys:var 1), are rewritten to (sys:expr 1), which op syntax. So two bugs are fixed: the incorrect treatment of meta-syntax, and the neglect to perform mdo in nested contexts. (check_parse_time_action): We need not handle mdo here any more; it will never make it into this function. Only actions done in the main clause list belong here, not parse time actions done at any nesting level. * tests/008/mdo.txr: New file. * y.tab.c.shipped: Updated.
* parser: allow trailing commas in json, via opt-in flag.Kaz Kylheku2021-07-291-2/+9
| | | | | | | | | | | | | | | | | | | | | | | * parser.c (read_bad_json_s): New symbol variable. (parser_common_init): Propagate value of *read-bad-json* into read_bad_json flag in parser structure. (parser_init): Initialize read_bad_json_s and register the *read-bad-json* dynamic variable. * parser.h (struct parser): New member, read_bad_json. (read_bad_json_s): Declared. * parser.y (json_val): Support an opt_comma symbol just before the closing bracket or brace. (opt_comma): New nonterminal symbol. Recognizes ',' or nothing. Error is flagged if ',' is recognized, and *read-bad-json* is nil. * y.tab.c.shipped: Updated. * tests/010/json.tl: New tests. * txr.1: Documented.
* hash: change make_hash interface.Kaz Kylheku2021-07-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The make_hash function now takes the hash_weak_opt_t enumeration instead of a pair of flags. * hash.c (do_make_hash): Take enum argument instead of pair of flags. Just store the option; nothing to calculate. (weak_opt_from_flags): New static function. (tweak_hash): Function removed. (make_seeded_hash): Adjust to new do_make_hash interface with help from weak_opt_from_flags. (make_hash, make_eq_hash): Take enum argument instead of pair of flags. (hashv): Calculate hash_weak_opt_t enum from the extracted flags, pass down to make_eq_hash or make_hash. * hash.h (tweak_hash): Declration removed. (make_hash, make_eq_hash): Declarations updated. * eval.c (me_case, expand_switch): Update make_hash calls to new style. (eval_init): Update make_hash calls and get rid of tweak_hash calls. This renders the tweak_hash function unused. * ffi.c (make_ffi_type_enum, ffi_init): Update make_hash calls to new style. * filter.c (make_trie, trie_add, filter_init): Likewise. * lib.c (make_package_common, obj_init, obj_print): Likewise. * lisplib.c (lisplib_init): Likewise. * match.c (dir_tables_init): Likewise. * parser.c (parser_circ_def, repl, parse_init): Likewise. * parser.l (parser_l_init): Likewise. * struct.c (struct_init, get_slot_syms): Likewise. * sysif.c (get_env_hash): Likewise. * lex.yy.c.shipped, y.tab.c.shipped: Updated.
* c_str now takes a self argument.Kaz Kylheku2021-06-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding a self parameter to c_str so that when a non-string occurs, the error is reported against a function. Legend: A - Pass existing self to c_str. B - Define self and pass to c_str and possibly other functions. C - Take new self parameter and pass to c_str and possibly other functions. D - Pass existing self to c_str and/or other functions. E - Define self and pass to other functions, not c_str. X - Pass nil to c_str. * buf.c (buf_strm_put_string, buf_str): B. * chksum.c (sha256_str, md5_str): C. (sha256_hash, md5_hash): D. * eval.c (load): D. * ffi.c (ffi_varray_dynsize, ffi_str_put, ffi_wstr_put, ffi_bstr_put): A. (ffi_char_array_put, ffi_wchar_array_put): C. (ffi_bchar_array_put): A. (ffi_array_put, ffi_array_out, ffi_varray_put): D. * ftw.c (ftw_wrap): A. * glob.c (glob_wrap): A. * lib.c (copy_str, length_str, coded_length,split_str_set, list_str, cmp_str, num_str, out_json_str, out_json_rec, display_width): B. (upcase_str, downcase_str, string_extend, search_str, do_match_str, do_rmatch_str, sub_str, replace_str, cat_str_append, split_str_keep, trim_str, int_str, chr_str, span_str, compl_span_str, break_str, length_str_gt, length_str_ge, length_str_lt, length_str_le, find, rfind, pos, rpos, mismatch, rmismatch): A. (c_str): Add self parameter and use in type mismatch diagnostic. If the parameter is nil, use "internal error". (flo_str): B, and correction to "flot-str" typo. (out_lazy_str, out_quasi_str, obj_print_impl): D. * lib.h (c_str): Declaration updated. * match.c (dump_var): X. (v_load): D. * parser.c (open_txr_file): C. (load_rcfile): E. (find_matching_syms, provide_atom): X. (hist_save, repl): B. * parser.h (open_txr_file): Declaration updated. * parser.y (chrlit): X. * regex.c (search_regex): A. * socket.c (getaddrinfo_wrap, sockaddr_pack): A. (dgram_put_string): B. (open_sockfd): D. (sock_connect): E. * stream.c (stdio_put_string, tail_strategy, vformat_str, open_directory, open_file, open_tail, remove_path, rename_path, tmpfile_wrap, mkdtemp_wrap, mkstemp_wrap): B. (do_parse_mode, parse_mode, make_string_byte_input_stream): B. (normalize_mode, normalize_mode_no_bin): E. (string_out_put_string, formatv, put_string, open_fileno, open_subprocess, open_command, base_name, dir_name, short_suffix, long_suffix): A. (run): D. (win_escape_cmd, win_escape_arg): X. * stream.h (parse_mode, normalize_mode, normalize_mode_no_bin): Declarations updated. * sysif.c (mkdir_wrap, do_utimes, dlopen_wrap, dlsym_wrap, dlvsym_wrap): A. (do_stat, do_lstat): C. (mkdir_nothrow_exists, ensure_dir): E. (chdir_wrap, rmdir_wrap, mkfifo_wrap, chmod_wrap, symlink_wrap, link_wrap, readlink_wrap, exec_wrap, getenv_wrap, setenv_wrap, unsetenv_wrap, getpwnam_wrap, getgrnam_wrap, crypt_wrap, fnmatch_wrap, realpath_wrap, opendir_wrap): B. (stat_impl): statfn pointer-to-function argument now takes self parameter. When calling it, we pass name. * syslog.c (openlog_wrap, syslog_wrapv): A. * time.c (time_string_local, time_string_utc, time_string_meth, time_parse_meth): A. (strptime_wrap): B. * txr.c (txr_main): D. * y.tab.c.shipped: Updated.
* parser: bug #; and #S don't play.Kaz Kylheku2021-06-091-6/+15
| | | | | | | | | When #; is used to comment out syntax containing #S, an attempt is made to instantiate a structure using nil as the type, and an exception occurs. * parser.y (hash, struct, tree): Check for parser->ignore and produce nil instead of making any of these objects.
* parser: new *read-unknown-structs* variable.Kaz Kylheku2021-06-081-3/+4
| | | | | | | | | | | | | | | | | | | | | * parser.c (read_unknown_structs_s): New symbol variable. (parser_common_init): Initialize read_unknown_structs flag member of the parser structure from the new special variable. (parse_init): Initialize read_unknown_struct_s variable. Register the *read-unknown-structs* dynamic variable. * parser.h (struct parser): New member, read_unknown_structs. (read_unknown_structs_s): Declared. * parser.y (struct): Generate the struct literal syntax not only for quasiquoted structures, but for structures with an unknown type name, if the read_unkonwn_structs flag is set. * txr.1: Documented. * share/txr/stdlib/doc-syms.tl: Regenerated. * y.tab.c.shipped: Regenerated.
* json: pattern matching test cases and bugfix.Kaz Kylheku2021-06-031-1/+2
| | | | | | | | | | | | | * parser.y (json_val): We must nreverse the json_pairs which were pushed in right to left order. This didn't matter for constructing hashes so it was left out, but under quasiquoting the order matters: it determines the order of evaluation and of pattern matching. * tests/011/patmatch.tl: New quasiquoting pattern matching cases, including JSON. * y.tab.c.shipped: Regenerated.
* json: get-json function.Kaz Kylheku2021-05-281-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | * eval.c (eval_init): get-json intrinsic registered. * parser.c (prime_parser): Handle prime_json. (lisp_parse_impl): Take enum prime_parser argument directly instead of the interactive flag. (lisp_parse, nread, iread): Pass appropriate prime_parser value instead of the original flag. (get_json): New function. Like nread, but passes prime_json. * parser.h (enum prime_parser): New constant, prime_json. (get_json): Declared. * parser.l (prime_scanner): Handle prime_json. * parser.y (SECRET_ESCAPE_J): New terminal symbol. (spec): New productions around SECRET_ESCAPE_J for parsing JSON. * lex.yy.c.shipped, y.tab.c.shipped, y.tab.h.shipped: Updated. * txr.1: Documented. * share/txr/stdlib/doc-syms.tl: Updated.
* json: two parser bugfixes.Kaz Kylheku2021-05-281-1/+2
| | | | | | | | * parser.y (json): Add forgotten call end_of_json in the '^' production. (json_val): Pass zero to vector, not 0 which is nil. * y.tab.c.shipped: Updated.
* json: hash issues with quasiquoting.Kaz Kylheku2021-05-281-21/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two problems. One is that in #J{~foo:"bar"} the foo expression is parsed while the scanner is in Lisp mode and the : token is grabbed by the parser as a lookahead token in the same mode. Thus it comes back as a SYMTOK (the colon symbol). We fix this by recognizing the colon symbol as synonym of the colon character. But only if it is spelled ":" in the syntax: we look at the lexeme. The second problem is that even if we fix the above, ~foo produces a sys:unquote form which is rejected because it is not a string. We fix this in the most straightforward way: by deleting the code which restricts keys to strings, an extension I already thought about making anyway. * parser.y (json_col): New non-terminal. This recognizes either a SYMTOK or a ':' token, with the semantic restriction that the SYMTOK's lexeme must be ":". (json_vals): Use yybadtok instead of generic error. (json_pairs): Do not require keys to be strings. Use json_col instead of ':', and also yybadtok instead of generic error. * y.tab.c.shipped: Updated.
* json: omission in quasiquoted array.Kaz Kylheku2021-05-271-1/+5
| | | | | | | | * parser.y (json_val): Handle json_vals being a list, indicating quasiquoting. We must nreverse it and turn it into a sys:vector-lit form. * y.tab.c.shipped: Updated.
* json: implement distinguished json quasiquote.Kaz Kylheku2021-05-271-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because #J<json> produces the (json ...) form that translates into quote, ^#J<json> yields a quasiquote around a quote. This has some disadvantages, because it requires an explicit eval in some situtions to do what the programmer wants. Here, we introduce an alternative: the syntax #J^<json> will produce a quasiquote instead of a quote. The new translation scheme is #J X -> (json quote <X>) #J^ X -> (json sys:qquote <X>) where <X> denotes the Lisp object translation of JSON syntax X. * parser.c (me_json): The quote symbol is now already in the json form, so all that is left to do here is to take the cdr to pop off the json symbol. * parser.l (JPUNC, NJPUNC): Allow ^ to be a punctuator in JSON mode. * parser.y (json): For regular #J, generate the new (json quote ...) syntax. Implement J# ^ which sets up the nonzero quasi_level around the processing of the JSON syntax, so that everything is in a quasiquote, finally producing the (json sys:qquote ...) syntax. * lex.yy.c.shipped, y.tab.c.shipped: Updated.
* json: support quasiquoting.Kaz Kylheku2021-05-271-10/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * parser.h (end_of_json_unquote): Declared. * parser.l (JPUNC, NJPUNC): Add ~ and * characters to set of JSON punctuators. (grammar): Allow closing brace character in NESTED, SPECIAL and QSPECIAL statues to be a token. This is because it occurs as a lookahead character in this situation #J{"foo":~expr}. The lexer switches from the JSON to the NESTED start state when it scans the ~ token, so that expr is treated as Lisp. But then } is consumed as a lookahead token by the parser in that same mode; when we pop back to JSON mode, the } token has already been scanned in NESTED mode. We add two new rules in JSON mode to the lexer to recognize the ~ unquote and ~* splicing unquote. Both have to push the NESTED start condition. (end_of_json_unquote): New function. * parser.y (JSPLICE): New token. (json_val): Logic for unquoting. The array and hash rules must now be prepared to deal with json_vals and json_pairs now producing a list object instead of a hash or vector. That is the signal that the data contains active quasiquotes and must be translated to the special literal syntax for quasiquoted vectors and hashes. Here we also add the rules for ~ and ~* unquoting syntax, including managing the lexer's transition back to the JSON start condition. (json_vals, json_pairs): We add the logic here to recognize unquotes in quasiquoting state. This is more clever than the way it is done in the Lisp areas of the grammar. If no quasiquotes occur, we construct a vector or hash, respectively, and add to it. If unquotes occur and if we are nested in a quasiquote, we switch the object to a list, and continue it that way. (yybadtoken): Handle JSPLICE. * lex.yy.c.shipped, y.tab.c.shipped, y.tab.h.shipped: Updated.
* json: extension: allow circle notation.Kaz Kylheku2021-05-261-0/+4
| | | | | | | | | | * parser.l (HASH_N_EQUALS, HASH_N_HASH): Recognize these tokens in the JSON start state also. * parser.y (json_val): Add the circular syntax, exactly like it is done for n_expr and i_expr. And it works! * lex.yy.c.shipped, y.tab.c.shipped, y.tab.h.shipped: Updated.
* New #J syntax for JSON objects in TXR Lisp.Kaz Kylheku2021-05-261-3/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (needs buffer literal error message cleanup) * parser.c (json_s): New symbol variable. (is_balanced_line): Follow braces out of initial state. This concession allows the listener to accept input like #J{"a":"b"}. (me_json): New static function (macro expander). The #J X syntax produces a (json Y) form, with the JSON syntax X translated to a Lisp object Y. If that is evaluated, this macro translates it to (quote Y). (parse_init): initialize json_s variable with interned symbol, and register the json macro. * parser.h (json_s): Declared. (end_of_json): Declared. * parser.l (num_esc): Treat u escape sequences in the same way as x. This function can then be used for handling the \u escapes in JSON string literals. (DIG19, JNUM, JPUNC, NJPUNC): New lex named patterns. (JSON, JLIT): New lex start conditions. (grammar): Recognize #J syntax, mapping to HASH_J token, which transitions into JSON start state. In JSON start state, handle all the elements: numbers, keywords, arrays and objects. Transition into JLIT state. In JLIT start state, handle all the elements of JSON string literals, including surrogate pair escapes. JSON literals share the fallback {UANY} fallback patter with other literals. (end_of_jason): New function. * parser.y (HASH_J, JSKW): New token symbols. (json, json_val, json_vals, json_pairs): New nonterminal symbols, and rules. (i_expr, n_expr): Generate json nonterminal, to hook the stuff into the grammar. (yybadtoken): Handle JKSW and HASH_J tokens. * lex.yy.c.shipped, y.tab.c.shipped, y.tab.h.shipped: Updated.
* parser: improve diagnostic for unterminated exprs.Kaz Kylheku2021-05-241-2/+5
| | | | | | | | | | * parser.y (parse): When issuing the diagostic indicating the likely starting line of the unterminated expression, instead of mentioning that line in the diagnostic text, let's just issue the diagnostic against that line. The programmer's text editor can then jump to that line. * y.tab.c.shipped: Updated.
* parser: #; tests and bugfixes.Kaz Kylheku2021-05-061-0/+7
| | | | | | | | | | | | | | | | This is motivated by the recent crash regression in the #; comment out mechanism. The parser doesn't have adequate coverage in the test suite. * tests/012/syntax.tl: New file, for testing syntax. A problem was found #;.expr did not work inside a list, only at top level. It required a space before the dot. * parser.y (listacc): A couple of productions to handle hash-semicolon immediately followed by a dot without any whitespace, and then by an expression. * y.tab.c.shipped: Regenerated.
* parser: fix regression in #; syntax.Kaz Kylheku2021-05-041-1/+4
| | | | | | | | | | | | A crash has showed up when processing commented-out objects. This is due to the most recent gc fix to the parser. * parser.y (set_syntax_tree): Avoid the set macro when tree has the special value nao (not an object). * y.tab.c.shipped: Regenerated. * y.tab.h.shipped: Likewise.
* parser: gc bug.Kaz Kylheku2021-05-011-4/+11
| | | | | | | | | * parser.y (set_syntax_tree): New static function. Uses GC-correct assignment via set macro. (spec): Call set_syntax_tree instead of raw assignments to parser->syntax_tree, which violate GC rules. * y.tab.c.shipped: Regenerated.
* parser: allow funny UTF-8 in regexes and literals.Kaz Kylheku2021-04-081-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main idea in this commit is to change a behavior of the lexer, and take advantage of it in the parser. Currently, the lexer recognizes a {UANYN} pattern in two places. That pattern matches a UTF-8 character. The lexeme is passed to the decoder, which is expected to produce exactly one wide character. If the UTF-8 is bad (for instance, a code in the surrogate pair range U+DCxx) then the decoder will produce multiple characters. In that case, these rules return ERRTOK instead of a LITCHAR or REGCHAR. The idea is: why don't we just return those characters as a TEXT token? Then we can just incorporate that into the literal or regex. * parser.l (grammar): If a UANYN lexeme decodes to multiple characters instead of the expected one, then produce a TEXT token instead of complaining about invalid UTF-8 bytes. * parser.y (regterm): Recognize a TEXT item as a regterm, converting its string value to a compound node in the regex AST, so it will be correctly treated as a fixed pattern. (chrlit): If a hash-backslash is followed by a TEXT token, which can happen now, that is invalid; we diagnose that as invalid UTF-8. (quasi_item): Remove TEXT rule, because the litchars constituent not generates TEXT. (litchars, restlistchar): Recognize TEXT item, similarly to regterm. * tests/012/parse.tl: New file. * tests/012/parse.expected: Likewise.
* parser: fix few memory leaks in error recovery.Kaz Kylheku2021-04-081-0/+4
| | | | | | | | | * parser.y (var, o_var): In a few error productions in which we have a SYMTOK item, we should free the lexeme. This doesn't solve all leaks: any time we have a parser stack containing SYMTOK or TEXT items that belong to rules that have not yet been reduced, and the parse job is aborted due to errors, we leak those.
* parser: fix bad precedence of @ token.Kaz Kylheku2021-01-241-3/+13
| | | | | | | | | | | | | | | | | | | | | | | Whereas @a..@b parses and transforms to (rcons @a @a), @(a)..@(a) goes to @(rcons a @(a)). * parser.l (grammar): Under 248 compatibility or lower, the @ character now produces the OLD_AT token. Otherwise it produces the '@' character, as before. * parser.y (OLD_AT): New token replaces the '@' at the old low precedence position. '@' is now at the highest precedence, together with OLD_DOTDOT. (We don't care about interactions between '@' and OLD_DOTDOT, because OLD_DOTDOT only exists in 185 compatibility, in which '@' is OLD_AT). (meta): The two rules have to be unfortunately duplicated for OLD_AT, since there is no BNF OR operator in Yacc. * txr.1: Compat note added. * lex.yy.c.shipped: Updated. * y.tab.c.shipped, y.tab.h.shipped: Likewise.
* Copyright year bump 2021.Kaz Kylheku2021-01-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * METALICENSE: 2020 copyrights bumped to 2021. Added note about SHA-256 routines from Colin Percival. * LICENSE, LICENSE-CYG, Makefile, alloca.h, args.c, args.h, arith.c, arith.h, buf.c, buf.h, cadr.c, cadr.h, chksum.c, chksum.h, chksums/crc32.c, chksums/crc32.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lex.yy.c.shipped, lib.c, lib.h, linenoise/linenoise.c, linenoise/linenoise.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, protsym.c, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/asm.tl, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/compiler.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/copy-file.tl, share/txr/stdlib/debugger.tl, share/txr/stdlib/defset.tl, share/txr/stdlib/doloop.tl, share/txr/stdlib/each-prod.tl, share/txr/stdlib/error.tl, share/txr/stdlib/except.tl, share/txr/stdlib/ffi.tl, share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/keyparams.tl, share/txr/stdlib/op.tl, share/txr/stdlib/package.tl, share/txr/stdlib/param.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/pmac.tl, share/txr/stdlib/quips.tl, share/txr/stdlib/save-exe.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/stream-wrap.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/trace.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/vm-param.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, time.c, time.h, tree.c, tree.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h, win/cleansvg.txr, y.tab.c.shipped: Copyright year bumped to 2021.
* txr: bugfix: repeat not finding Lisp vars in brace.Kaz Kylheku2020-10-011-5/+9
| | | | | | | | | | | | | | | | | | | The following does not work and is fixed here: @(output) @(repeat) @{lisp-expr ...} @(end) When a Lisp expression occurs in a braced expansion syntax, it is not not traversed to find variables. * parser.y (extract_vars): Treat the second element of a sys:var as a Lisp expression, and find variables. Don't do this in 128 or older compatibility mode, because then that is a TXR expression. * y.tab.c.shipped: Regenerated.
* txr: repeat ferrets out Lisp-embedded vars.Kaz Kylheku2020-08-171-1/+5
| | | | | | | | | | | | | | | Gone is the need for :vars to inform @(repeat)/@(rep) about variable references buried in Lisp. * eval.c (expand_with_free_refs): Change to external linkage. * eval.h (expand_with_free_refs): Declared. * parser.y (extract_vars): Handle sys:expr forms, which are embedded Lisp via expand_with_free_refs to uncover their free variables. * txr.1: Redocumented this area.
* txr: identify output repeat vars at parse timeKaz Kylheku2020-08-171-1/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | Up to now the @(repeat) and @(rep) directives have scanned their interior to find output variables. We now hoist that into the parser; the variables are found up-front and integrated into the abstract syntax. This work anticipates doing a more proper job of identifying free variables in Lisp expressions. * match.c (extract_vars): Delete static this function here. It moves into parser.y. (extract_bindings): Don't call extract_vars to obtain the variables occurring in the repeat body. Get it as a new argument, occur_vars. (do_output_line, do_repeat): Extract the new occur_vars element of the abstract syntax node and pass it to extract_bindings. * parser.y (extract_vars): New static function, moved here from match.c. (repeat_rep_helper): Scan over each of the repeat subclauses with extract vars. Catenate the resulting lists together and pass through uniq to squash repeated symbols. Then add the resulting vars as a new element of the returned syntax node.
* parser: add missing cases in yybadtoken.Kaz Kylheku2020-07-081-0/+5
| | | | | * parser.y (yybadtoken): Add all unhandled token types once and for all: BLOCK, GATHER, MOD, MODLAST, SPLICE.
* txr: support @(if)/@(elif)/@(else) in @(output).Kaz Kylheku2020-07-071-3/+44
| | | | | | | | | | | | | | | | | | | This turns out to be way easier than I thought. * match.c (do_output_if): New static function. (do_output): Handle if via do_output_if. * parser.y (out_if_clause, out_elif_clauses_opt, out_else_clause_opt): New nonterminal symbols and grammar rules. (out_clause): Now produces out_if_clause. (not_a_clause): Remove ELIF and ELSE; these entries here cause conflicts now. Here, continue to recognize the Lisp if, which is distinguished by having at least two arguments. out_if_clause matches only a one-argument if, and a no-argumeent one that is diagnosed as erroneous. * txr.1: Documented.
* Remove unnecessary #include directives.Kaz Kylheku2020-04-221-4/+0
| | | | | | | | | | Time for some spring cleaning. * args.c, arith.c, buf.c, cadr.c, chksum.c, debug.c, ftw.c, gc.c, gencadr.txr, glob.c, hash.c, lisplib.c, match.c, parser.c, parser.l, parser.y, rand.c, signal.c, stream.c, strudel.c, syslog.c, tree.c, unwind.c, utf8.c, vm.c: Numerous unnecessary #include directives removed.
* parser: eliminate struct list_accum.Kaz Kylheku2020-04-101-42/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | As a final round of this recent work, we observe that since the accumulator structure has been reduced to two members, we can eliminate one of them by using a cons cell as the accumulator, and threading the second value through the cdr. That is to say, the listacc grammar rule's semantic value will now be the tail cons of the list being constructed. The cdr of this cons will, temporarily, be a back pointer to the head (making the list temporarily circular). The n_exprs reduction will fix this up; it will put the correct terminating atom in place of the head (either nil, or the dotted item if there is one), and yield the head as the semantic value. * lib.h (struct list_accum): Removed. (Thus, finding a better home for this would, after all, have been a waste of time). * parser.y (lacc, splacc): Static functions removed. (union YYSTYPE): lacc membber removed. (n_exprs): Adjust to new semantic value coming from listacc. (listacc): Now of type val again. Yields pointer to tail cons as semantic value, whose cdr points to the head of the list.
* parser: move cons dot handling to higher rule.Kaz Kylheku2020-04-101-18/+3
| | | | | | | | | | | | | | * lib.h (struct list_accum): dot member removed. * parser.y (misplaced_consing_dot_check): Function removed. The misplaced consing dot diagnostic is still provided as before by yybadtoken. (n_exprs): Production for CONSDOT is moved here out of listacc. There is no $1.dot member to deal with; the dot is very simply handled here. (listacc): Remove calls to misplaced_consing_dot_check. CONSDOT production moved to n_exprs. (lacc, splacc): Remove initialization of dot member.
* parser: streamline core list building.Kaz Kylheku2020-04-091-47/+57
| | | | | | | | | | | | | | | | | | | | | The r_expr grammar symbol is replaced with listacc which has a different type: a new Yacc node type that has three fields for building a list from left to right without nreverse, and the dotted pair item. * lib.h (struct list_accum): New struct type. Not a great place for it, but we don't have a parser-specific header that is included before y.tab.h: parser.h is included after y.tab.h. * parser.y (misplaced_consing_dot_check): The val argument is just the existing dotted item; if it is other than nao, the error is generated. (lacc, splacc): New static functions. (YYSTYPE): New union member, lacc of type struct list_accum. (r_exprs): Grammar symbol removed. (listacc): New grammar symbol of type listacc. The rules are those of r_exprs upgraded to construct the list in one pass.
* warning cleanup: add casts for unused parameters.Kaz Kylheku2020-04-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first round of an effort to enable GCC's -Wextra option. All function parameters that are unused an that we cannot eliminate are treated with a cast to void in the function body. * args.c (args_key_check_store): Cast unused param to void. * combi.c (perm_list_gen_fill): Likewise. * eval.c (op_error, op_meta_error, op_quote op_qquote_error, op_unquote_error, op_load_time_lit, me_each, me_for, me_quasilist, me_flet_labels, hash_min_max, me_ignerr, me_whilet, me_iflet_whenlet, me_dotimes, me_mlet, me_load_time, me_load_for): Likewise. * ffi.c (ffi_void_put, ffi_fixed_dynsize, *ffi_fixed_alloc, ffi_noop_free, ffi_void_get, ffi_simple_release, ffi_i8_put, ffi_i8_get, ffi_u8_put, ffi_u8_get, ffi_i16_put, ffi_i16_get, ffi_u16_put, ffi_u16_get, ffi_i32_put, ffi_i32_get, ffi_u32_put, ffi_u32_get, ffi_i64_put, ffi_i64_get, ffi_u64_put, ffi_u64_get, ffi_char_put, ffi_char_get, ffi_uchar_put, ffi_uchar_get, ffi_bchar_get, ffi_short_put, ffi_short_get, ffi_ushort_put, ffi_ushort_get, ffi_int_put, ffi_int_get, ffi_uint_put, ffi_uint_get, ffi_long_put, ffi_long_get, ffi_ulong_put, ffi_ulong_get, ffi_float_put, ffi_float_get, ffi_double_put, ffi_double_get, ffi_val_put, ffi_val_get, ffi_be_i16_put, ffi_be_i16_get, ffi_be_u16_put, ffi_be_u16_get, ffi_le_i16_put, ffi_le_i16_get, ffi_le_u16_put, ffi_le_u16_get, ffi_be_i32_put, ffi_be_i32_get, ffi_be_u32_put, ffi_be_u32_get, ffi_le_i32_put, ffi_le_i32_get, ffi_le_u32_put, ffi_le_u32_get, ffi_be_i64_put, ffi_be_i64_get, ffi_be_u64_put, ffi_be_u64_get, ffi_le_i64_put, ffi_le_i64_get, ffi_le_u64_put, ffi_le_u64_get, ffi_wchar_put, ffi_wchar_get, ffi_sbit_get, ffi_ubit_get, ffi_cptr_get, ffi_str_in, ffi_str_put, ffi_str_get, ffi_str_d_get, ffi_wstr_in, ffi_wstr_get, ffi_wstr_put, ffi_wstr_d_get, ffi_bstr_in, ffi_bstr_put, ffi_bstr_get, ffi_bstr_d_get, ffi_buf_in, ffi_buf_put, ffi_buf_get, ffi_buf_d_in, ffi_buf_d_put, ffi_buf_d_get, ffi_closure_put, ffi_ptr_in_in, ffi_ptr_in_d_in, ffi_ptr_in_out, ffi_ptr_out_in, ffi_ptr_out_out, ffi_ptr_out_null_put, ffi_ptr_out_s_in, ffi_flex_struct_in, ffi_carray_get, ffi_union_get, make_ffi_type_builtin, make_ffi_type_array, ffi_closure_dispatch, ffi_closure_dispatch_safe): Likewise. * gc.c (cobj_destroy_stub_op, cobj_destroy_free_op, cobj_mark_op): Likewise. * lib.c (seq_iter_get_nil, seq_iter_peek_nil): Likewise. * linenoise/linenoise.c (sigwinch_handler): Likewise. * parser.c (repl_intr, read_eval_ret_last, repl_warning, is_balanced_line): Likewise. * parser.y (yydebug_onoff): Likewise. * socket.c (dgram_close): Likewise. * stream.c (unimpl_put_string, unimpl_put_char, unimpl_put_byte, unimpl_unget_char, unimpl_unget_byte, unimpl_put_buf, unimpl_fill_buf, unimpl_seek, unimpl_truncate, unimpl_set_sock_peer, null_put_string, null_put_char, null_put_byte, null_get_line, null_get_char, null_get_byte, null_close, null_flush, null_seek, null_set_prop, null_get_error, null_get_error_str, null_clear_error, null_get_fd, dir_close): Likewise. * struct.c (struct_type_print): Likewise. * unwind.c (me_defex): Likewise.
* internals: rename misnamed curry_* functions.Kaz Kyheku2020-03-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The various curry_xx_yy functions perform partial application, not currying. The curry prefix is being renamed to pa (partially apply). * lib.c (remq_lazy, remql_lazy, remqual_lazy, tree_Find): Updated. (do_curry_12_1, do_curry_12_1_v, do_curry_12_2, do_curry_123_1, do_curry_123_23, do_curry_123_2, do_curry_123_3, do_curry_1234_1, do_curry_1234_34): Renamed to do_pa_12_1, do_pa_12_1_v, do_pa_12_2, do_pa_123_1, do_pa_123_23, do_pa_123_2, do_pa_123_3, do_pa_1234_1, do_pa_1234_34. (curry_12_1, curry_12_1_v, curry_12_2, curry_123_1, curry_123_23, curry_123_2, curry_123_3, curry_1234_1, curry_1234_34): Renamed to pa_12_1, pa_12_1_v, pa_12_2, pa_123_1, pa_123_23, pa_123_2, pa_123_3, pa_1234_1, pa_1234_34. (transposev, do_juxt): Updated. * lib.h: Declarations renamed. * eval.c (subst_vars, qquote_init, expand_catch, weavev): Updated. * filter.c (get_filter, build_filter_from_list, filter_string_tree, filter_init): Updated. * match.c (tx_subst_vars, do_txeval, v_freeform, v_bind, v_throw, v_deffilter, v_assert, h_assert): Updated. * parser.y (gather_clause): Updated. * regex.c (regex_range_full_fun, regex_range_left_fun, regex_range_right_fun, regex_range_search_fun): Updated. * stream.c (open_files, open_files_star): Updated. * txr.c (txr_main): Updated. * unwind.c (me_defex): Updated.
* Copyright year bump 2020.Kaz Kylheku2019-12-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * LICENSE, LICENSE-CYG, METALICENSE, Makefile, alloca.h, args.c, args.h, arith.c, arith.h, buf.c, buf.h, cadr.c, cadr.h, chksum.c, chksum.h, chksums/crc32.c, chksums/crc32.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lib.c, lib.h, linenoise/linenoise.c, linenoise/linenoise.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, protsym.c, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/asm.tl, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/compiler.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/debugger.tl, share/txr/stdlib/defset.tl, share/txr/stdlib/doloop.tl, share/txr/stdlib/error.tl, share/txr/stdlib/except.tl, share/txr/stdlib/ffi.tl, share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/keyparams.tl, share/txr/stdlib/op.tl, share/txr/stdlib/package.tl, share/txr/stdlib/param.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/pmac.tl, share/txr/stdlib/save-exe.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/stream-wrap.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/trace.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/vm-param.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, tree.c, tree.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h, win/cleansvg.txr: Extended copyright notices to 2020.
* parser: forgotten top-level .? cases.Kaz Kylheku2019-11-191-0/+8
| | | | | | * parser.y (hash_semi_or_n_expr, hash_semi_or_i_expr): We need to handle OREFDOT here so that .?sym can parse as a top-level expression. Issue reported by vapnik spaknik.
* parser: missing cases in yybadtoken.Kaz Kylheku2019-11-191-0/+3
| | | | | | * parser.y (yybadtoken): Add missing cases for UREFDOT, OREFDOT and UOREFDOT, so these don't fall through to being reported as a junk character.
* parser: merge cases in yybadtoken.Kaz Kylheku2019-11-191-1/+1
| | | | | * parser.y (yybadtoken): Merge CONSDOT and LAMBDOT cases since they have identical code.
* syntax: new .? operator for null-safe object access.Kaz Kylheku2019-11-051-12/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | * lib.c (obj_print_impl): Render the new syntactic conventions introduced in qref/uref back into the .? syntax. The printers for qref and uref are united into a single implementation to reduce code proliferation. * parser.l (grammar): Produce new tokens OREFDOT and UOREFDOT. * parser.y (OREFDOT, UREFDOT): New terminal symbols. (n_expr): Handle .? syntax via the new OREFDOT and UOREFDOT token via qref_helper and uoref_helper. Logic for the existing referencing dot is moved into the new qref_helper function. (n_dot_expr): Handle .? syntax via uoref_helper. (uoref_helper, qref_helper): New static functions. * share/txr/stdlib/struct.tl (qref): Handle the new case when the expression which gives the object is (t expr). Handle the new case when the first argument after the object has this form, and is followed by more arguments. Both these cases emit the right conditional code. (uref): Handle the leading .? syntax indicated by a leading t by generating a lambda which checks its argument for nil. Transformations to qref handle the other cases. * txr.1: Documentation updated in several places.
* parser: use faster, unsafe nreverse.Kaz Kylheku2019-10-251-5/+5
| | | | | | | | | | * lib.c (us_nreverse): New function. * lib.h (us_nreverse): Declared. * parser.y (clauses_opt, n_exprs, r_exprs): Use us_nreverse instead of nreverse to rorder lists built in reverse into final shape.
* tree: allow quasiquoting into #T syntax.Kaz Kylheku2019-09-281-10/+13
| | | | | | | | | | | | | | | | | * eval.c (tree_lit_s, tree_construct_s): New symbol variables. (expand_qquote_rec): Handle sys:tree-lit syntax generated by quasi-quoted #T notaton by expanding and converting to sys:tree-constuct call. (eval_init): Initialize tree_lit_s and tree_construct_s. * eval.h (tree_lit_s, tree_construct_s): Declared. * parser.y (tree): Produce sys:tree-lit syntax when #T is quasi-quoted, and unquotes occur inside it. * tree.c (tree_construct_fname, tree_construct): New static functions. (tree_init): Register sys:tree-construct intrinsic function.
* New data structure: binary search trees.Kaz Kylheku2019-09-251-2/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding binary search trees based on the new tnode cell. The scapegoat algorithm is used, which requires no additional storage in a cell. In the future we may go to something else, like red-black trees, and carve out a bit in the tag field of the cell for the red/black color. Tree cells store only single key objects, not key/value pairs. However, which part of the key object is compared is determined by a custom key function stored in the tree container. For instance, tree nodes can be cons cells, and car can be used as the key function; the cdr then stores an associated value. Trees have a printed notation #T(<props> <key>*) where <props> is a list of up to three items: <props> ::= ([<key-fn> [<less-fn> [<equal-fn>]]]) key-fn, less-fn and equal-fn are function names. If they are missing or nil, they default, respectively, to identity, less and equal. For security, the printed notation is machine-readable only if these options are symbols, not lambda expressions. Furthermore, the symbols must be listed in the special variable *tree-fun-whitelist*. * eval.c (less_s): New symbol variable. (eval_init): Initialize less_s. * eval.h (less_s): Declard. * parser.h (grammar): New #T token recognized, mapped to HASH_T. * parser.y (HASH_T): New terminal symbol. (tree): New non-terminal symbol. (i_expr, n_expr): Add tree to productions. (fname_helper): New static function. (yybadtoken): Map HASH_T to "#T". * protsym.c: Tweaked accidentally; remove. * tree.c (TREE_DEPTH_MAX): New macro. (struct tree): New struct type. (enum tree_iter_state): New enumeration. (struct tree_iter): New struct type. (tree_iter_init): New macro. (tree_s, tree_fun_whitelist_s): New symbol variables. (tn_size, tn_size_one_child, tn_lookup, tn_find_next, tn_flatten, tn_build_tree, tr_rebuild, tr_find_rebuild_scapegoat, tr_insert, tr_lookup, tr_do_delete, tr_delete, tree_insert_node, tree_insert, tree_lookup_node, tree_lookup, tree_delete, tree_root, tree_equal_op, tree_print_op, tree_mark, tree_hash_op): New static functions. (tree_ops): New static struct. (tree): New function. (tree_init): Initialize tree_s and tree_fun_whitelist_s symbol variables. Register intrinsic functions tree, tree-insert-node, tree-insert, tree-lookup-node, tree-lookup, tree-delete, tree-root. Register special variable *tree-fun-whitelist*. * tree.h (tree_s, tree_fun_whitelist_s, tree): Declared. (tree_fun_whitelist): New macro.
* New data type: tnode.Kaz Kylheku2019-09-221-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Binary search tree nodes are being added as a basic heap data type. The C type tag is TNOD, and the Lisp type is tnode. Binary search tree nodes have three elements: a key, a left child and a right child. The printed notation is #N(key left right). Quasiquoting is supported: ^#N(,foo ,bar) but not splicing. Because tnodes have three elements, they they fit into TXR's four-word heap cell, not requiring any additional memory allocation. These nodes are going to be the basis for a binary search tree container, which will use the scapegoat tree algorithm for maintaining balance. * tree.c, tree.h: New files. * Makefile (OBJS): Adding tree.o. * eval.c (expand_qquote_rec): Recurse through tnode cells, so unquotes work inside #N syntax. * gc.c (finalize): Add TNOD to no-op case in switch; tnodes don't require finalization. (mark_obj): Traverse tnode cell. * hash.c (equal_hash): Add TNOD case. * lib.c (tnode_s): New symbol variable. (seq_kind_tab): New entry for TNOD, mapping to SEQ_NOTSEQ. (code2type, equal): Handle TNOD. (obj_init): Initialize tnode_s variable. (obj_print_impl, populate_obj_hash): Handle TNOD. (init): Call tree_init function in tree.c. * lib.h (enum type, type_t): New enumeration TNOD. (struct tnod): New struct type. (union obj, obj_t): New union member tn of type struct tnod. (tnode_s): Declard. * parserc.c (circ_backpatch): Handle TNOD, so circular notation works through tnode cells. * parser.l (grammar): Recognize #N prefix, mapping to HASH_N token. * parser.y (HASH_N): New grammar terminal symbol. (tnode): New nonterminal symbol. (i_expr, n_expr): Add tnode cases to productions. (yybadtoken): Map HASH_N to "#N" string.
* parser: #; notation shouldn't intern symbols.Kaz Kylheku2019-08-181-2/+3
| | | | | | | | | | When #; is used to ignore an object, parsing that object shouldn't cause symbols to be interned, nor errors about unknown packages. * parser.y (ifnign): New macro. (i_expr, n_expr): For SYMTOK, don't call symhlpr when parsing under ignore flag. Just yield nil as the semantic value.
* parser: rename circ_suppress flag.Kaz Kylheku2019-08-181-13/+13
| | | | | | | | | | | * parser.h (struct parser): eof flag changed to unsigned char. circ_suppress flag renamed to ignore, changed to unsigned char and relocated next to eof to compact together. * parser.c (parser_circ_ref): Follow rename. * parser.y (hash_semi_or_n_expr, hash_semi_or_i_expr, n_exprs, parse): Likewise.