2009-12-03 Kaz Kylheku Version 027. Code cleanup. gc-related bugfix. Improved file copying semantics of make install, and adherence for DESTDIR convention. * txr.c (version): Bumped to 027. * txr.1: Bumped version to 027. * configure: Bumped txr_ver to 027. 2009-12-03 Kaz Kylheku * Makefile (CFLAGS): Better test for g++, when removing warning options not appropriate for g++. Sometimes g++ may be called something that dosn't end in g++, like g++4. 2009-12-03 Kaz Kylheku * parser.l (YY_NO_UNPUT): Removed superfluous #define. This is not needed because suppressing generation of unput is requested via the %option. In scanners generated by the legacy version of flex, 2.5.4, still widely in use. this redundancy leads to a multiple #define YY_NO_UNPUT and a compiler warning. 2009-12-03 Kaz Kylheku Fix for failing test suite on MIPS machine, due to gc failing to mark a local variable in txr_main. * txr.c (txr_main): Changed from internal linkage to external. This prevents gcc -O2 from inlining txr_main into main. We need separate stack frames for main and txr_main, in order to be sure that when walking to the bottom of stack pointer, we visit all locals in main. This is the whole reason why there is a separate txr_main. 2009-12-02 Kaz Kylheku * Makefile (tests): Don't depend on the executable. Otherwise, during make install-tests, if it doesn't exist in the install directory, a gcc compile command gets deposited into the run.sh generated script. (install-tests): Fixes to make this work when using a separate build directory. Split the cpio -p job into a cpio -i piping into cpio -o. 2009-12-02 Kaz Kylheku * Makefile (install-tests): New target. Provides a way to make the test cases part of the installation, and a generated script to run the commands on the installation host. 2009-12-02 Kaz Kylheku Fix annoyances with dependency generation, such as picking up local files that are not in the project. * Makefile (depend): Rule passes object file names as arguments to depend.txr script. * depend.txr: Changed to take names of object files from command line, rather than scanning the directory for all .c files. Switched to new style next directives, using quasiliterals. * dep.mk: Regenerated. 2009-11-28 Kaz Kylheku * Makefile (CFLAGS): If the compiler matches the pattern %g++, then remove some C-front-end-specific warnings from CFLAGS, which the g++ front end will complain about. 2009-11-28 Kaz Kylheku * Makefile (CFLAGS): add -Dlint to CFLAGS when compiling y.tab.o. This suppresses some warnings from a byacc-generated parser, and gets rid of a useless static sccsid array. May help with Bison-generated parser also. 2009-11-28 Kaz Kylheku * parser.l: Use flex options to suppress generation of the unused functons yyunput and yyinput, thus getting rid of some compiler diagnostics. 2009-11-28 Kaz Kylheku Code cleanup. All private functions static. Private stuff in regex module not exposed in header. Etc. * configure (diag_flags): Add -Wmissing-prototypes and -Wstrict-prototypes. * gc.c (more): Turn into prototyped definition with (void). * gc.h (unmark): Declared. * hash.c (hash_equal, hash_destroy, hash_mark, hash_grow): Private functions defined static. * lib.c (flatten_helper, do_bind2, do_bind2other): Likewise. * lib.h (make_package, merge, d): Declared. * match.c (dump_shell_string, dump_byte_string, dump_var, dump_bindings, depth, weird_merge, bindable, dest_bind, match_line, format_field, subst_vars, eval_form, complex_open, complex_snarf, complex_stream, robust_length, bind_car, bind_cdr, extract_vars, extract_bindings, do_output_line, do_output, match_files): Private functions defined static. (map_leaf_lists, complex_close): Unused functions removed. * parser.h (yyerror): Declared. * regex.c (bitcell_t, BITCELL_ALL1, CHAR_SET_SIZE, chset_type_t, cset_L0_t, cset_L1_t, cset_L2_t, cset_L3_t, struct any_char_set, struct small_char_set, struct displaced_char_set, struct large_char_set, struct xlarge_char_set, union char_set, nfa_kind_t, struct nfa_state_accept, struct nfa_state_empty, struct nfa_state_single, struct nfa_state_set, struct nfa_state, struct nfa_machine): Definitions moved here from regex.h file. (L0_fill_range, L0_contains, L1_full, L1_fill_range, L1_contains, L1_free, L2_full, L2_fill_range, L2_contains, L2_free, L3_fill_range, L3_contains, L3_free, char_set_create, char_set_destroy, char_set_compl, char_set_add, char_set_add_range, char_set_contains, nfa_state_accept, nfa_state_empty, nfa_state_single, nfa_state_wild, nfa_state_free, nfa_state_shallow_free, nfa_state_set, nfa_state_empty_convert, nfa_state_merge, nfa_make, nfa_combine, nfa_compile_set, nfa_all-states, nfa_closure, nfa_move): Private functions defined static. * regex.h (bitcell_t, BITCELL_ALL1, CHAR_SET_SIZE, chset_type_t, cset_L0_t, cset_L1_t, cset_L2_t, cset_L3_t, struct any_char_set, struct small_char_set, struct displaced_char_set, struct large_char_set, struct xlarge_char_set, union char_set, nfa_kind_t, struct nfa_state_accept, struct nfa_state_empty, struct nfa_state_single, struct nfa_state_set, struct nfa_state, struct nfa_machine): Definitions removed. (char_set_created, char_set_destroy, char_set_compl, char_set_add, char_set_add_range, char_set_contains nfa_state_accept, nfa_state_empty, nfa_state_single, nfa_state_wild, nfa_state_set, nfa_state_free, nfa_state_shallow_free, nfa_state_merge): Extern declarations removed. * stream.c (stdio_stream_print, stdio_stream_destroy, stdio_stream_mark, stdio_get_char, stdio_get_byte, string_in_stream_mark, vformat_str): Private functions defined static. * txr.c (oom_realloc_handler, help, hint, remove_hash_bang_line): Likewise. * unwind.c (uw_unwind_to_exit_point): Likewise. 2009-11-28 Kaz Kylheku * configure: Workaround in banner code for coreutils printf %.*s bug. 2009-11-27 Kaz Kylheku Switching to DESTDIR convention for install. Make install step does some things more correctly now, without relying on the install program. * configure: Help text doesn't refer to ``Makefile variables'' but ``make variables'', or ``variables in config.make''. The install_prefix variable becomes DESTDIR now in config.make. * Make (INSTALL): New rule body macro. (install): Uses of mkdir -p and cp switched to a call to the INSTALL macro. 2009-11-26 Kaz Kylheku Version 026. Fixed wchar_t build problem in parser.y. Improved configure script to auto-detect yacc program. Txr works with either Berkeley yacc (byacc) or Bison. Fixed some two uninitialized memory bugs. Valgrind API is now used to integrate GC memory manager with valgrind. The symbols nothrow and args in the next directive are now keyword symbols, written :nothrow and :args. (Breaks backward compatibility; sorry!) * txr.c (version): Bumped to 026. * txr.1: Bumped version to 026. * configure: Bumped txr_ver to 026. 2009-11-26 Kaz Kylheku Not all systems have a yacc alias for the yacc program. txr is known to work with two yacc implementations: GNU Bison and Berkeley yacc. Let's add some auto-detection for yacc. * Makefile: use "include" rather than "-include" for including config.make, so that make fails if the file does not exist. (conftest.yacc): New target. Just outputs the value of the variable expansion of $(YACC). * configure (yaccname): New variable. (gen_config_make): New function. Steps added to test for existence of various yaccs. 2009-11-25 Kaz Kylheku * gc.c (mark_mem_region): Bugfix: do not mess with the valgrind accessibility of the heap object if valgrind debugging is not enabled. 2009-11-25 Kaz Kylheku * parser.y (grammar): Fixes for bison 2.4.1. Remove superfluous action in chrlit. Include for abort. 2009-11-25 Kaz Kylheku Refinements to Valgrind support. * gc.c (mark_mem_region): If a pointer from the stack is valid for the heap, it may point to a free object, which is marked in accessible. We must grant the garbage collector access to the object. If the object is free, close off access. This is not 100% correct, because if the object is accessible but undefined, then we end up flipping it to defined. (sweep): Before sweeping each heap, mark the entire block as defined. This is necessary because sweep accesses blocks, which may be free, and thus inaccessible. Then, during the sweep, any block which is already free must be marked inaccessible again. This means that the remaining blocks that are reachable become defined. Here that is okay, because gc has marked all those blocks. If any of them had uninitialized members, that would have been caught by valgrind during the marking phase, if not sooner. 2009-11-25 Kaz Kylheku More Valgrind support. New option --vg-debug which turns on Valgrind protection of free blocks. This works independently of --gc-debug. * gc.c (opt_vg_debug): New conditionally defined global variable. (more): Mark entire heap of free blocks inaccessible, if vg debugging is enabled. (make_obj): If vg debugging enabled, mark returned block as accessible, but undefined, and take care to grant self temporary access while manipulating the free list. (finalize): Removed old debugging logic of not freeing strings and vectors during gc debug. If the null pointers are ever a problem during debugging, they can be checked inside obj_print, and turned into # notation. (sweep): Switch to FIFO free block recycling if vg debugging is enabled, just like when gc debugging is enabled. Mark freed blocks as inaccessible, careful to grant self temporary access while manipulating the free list. * txr.c (txr_main): Parse the --vg-debug option. * txr.h (opt_vg_debug): Conditionally declared. 2009-11-25 Kaz Kylheku Fix a build breakage that may happen on some platforms. The parser.y file includes "utf8.h", which uses the the type wint_t. It also includes "lib.h" which uses "wchar_t". But it fails to include any headers which define these types. The generated y.tab.c picks up wchar_t by the Bison-inserted inclusion of , so that's how we got that. But wint_t does not come from any of the headers---if they are standard-conforming. * parser.y: Add inclusion of and . 2009-11-25 Kaz Kylheku More valgrind integration. Vector objects keep displaced pointers to vector data; they point to element 0 which is actually the third element of the vector. If an object is only referenced by interior pointers, Valgrind reports it as possibly leaked. This change conditionally adds a pointer to the true start of the vector, if Valgrind support is enabled. * lib.h (struct vec): vec_true_start, new member. * lib.c (vector, vec_set_fill): Maintain vec_true_start. 2009-11-25 Kaz Kylheku First stab at Valgrind integration. First goal: eliminate false positives when gc is accessing uninitialized parts of the stack. * configure (valgrind): New variable. Defaults to false (do not build valgrind support). New check for whether the valgrind API is actually avilable if --valgrind is selected. (HAVE_VALGRIND): Conditionally added to config.h. * gc.c: Conditionally include valgrind memcheck.h header. (mark_mem_region): After pulling out a value from the stack, mark that copy as defined memory using VALGRIND_MAKE_MEM_DEFINED. (mark): Removed check for a registered root variable pointer being null; this cannot happen, unless someone registers a null pointer, or the stack is trashed. The comment about a possible null was misleading. 2009-11-24 Kaz Kylheku Fix uninitialized memory locations. * hash.c (make_hash): Uninitialized h->count member. * lib.c (mkustring): Preallocated string buffer to have its null terminator byte initialized, because the caller does not do so (e.g. see lit_har_helper in parser.y). The calling module is responsible for initializing all API-accessible parts of the string, but the null belongs to the string implementation. 2009-11-24 Kaz Kylheku Switching to keyword symbols for :args and :nothrow. * lib.c (args_s, nothrow_s): Renamed to args_k and nothrow_k. (flattn_s): Renamed to flatten_s. (obj_init): args_k and nothrow_k interned in keyword package. * lib.h (args_s, nothrow_s, flattn_s): Declarations updated. * match.c (match_files): Follow name changes. * tests/004/query1.txr: Changed nothrow to :nothrow. * txr.1: Documentation updated. 2009-11-24 Kaz Kylheku /Now/ this can be released as 025. utf8.c (utf8_from_uc): Fix bug introduced several commits ago (porting to C++). Caught by regression test suite. Found using git bisect. 2009-11-24 Kaz Kylheku Version 025 External changes: Flattening an empty list produces an empty list, not (()), which is a list containing an empty list. Tightened up semantics of bind, merge and other forms. Fixed false positives in binding. More bugfixes in the parser leading to garbage error messages. (Still no regression test cases for error cases, oops). Fixed crash in regexp function. Symbol packages added. Keyword symbols (symbols in keyword package) introduced. Clarified semantics that t, nil and keywords evaluate to themselves. Fixed bugs in the system for building in a separate directory. Configuration script now tests the compiler for sanity, and runs compiler-based tests to detects which integer type to use for casting an obj_t * value to a number, and what specifiers to use for inline functions. Internal changes: Macros replaced with inline functions. Uses of obj_t * replaced with val typedef everywhere. Exceptions occuring during early initialization no longer lead to an infinite recursion due to streams not working. The long type is no longer used, but a configured typedef. Configure script now spits out a "config.h" header that is widely included. Symbol globals renamed to _s naming scheme. Code made portable to C++. A new configure flag --ccname make it easier to switch compilers. * txr.c (version): Bumped to 025. * txr.1: Bumped version to 025. * configure: Bumped txr_ver to 025. 2009-11-24 Kaz Kylheku Auto-detect what specifiers to use for inline functions. Allow compiler command to be set independently of full path for easier compiler switching. * Makefile (conftest.o): Target removed. What this rule does is already an implicit rule; and nowhere else in the Makefile are there rules for .c -> .o. (conftest2): New target, for two-translation-unit config test program. (INLINE_FLAGS): Removed. * configure (ccname, inline): New variables. (inline_flags): Variable removed. INLINE_FLAGS not generated any more in config.make. Added test for what inline specifiers to use, which is turned into #define INLINE ... in the config.h header. * lib.h: (tag, is_ptr, is_num, is_chr, is_lit, type, auto_str, static_str, litptr): Changed from inline to INLINE. 2009-11-24 Kaz Kylheku Changes to make the code portable to C++ compilers, which can be taken advantage of for better diagnostics. * gc.c (more, mark_obj, sweep, unmark): Obey stricter C++ rules with regard to enumerations. (make_obj): Avoid using C++ keyword "try". * lib.c: Removed duplicate definitions of objects, found by C++. (chk_malloc, chk_realloc): Casts needed when converting from void *. (list): Discovered and fixed lack of va_end. (trim_str, acons_new_l): Avoid use of C++ keyword "new". (make_sym): Follow rename of struct member. * lib.h (struct sym): Renamed val member to value. (null): Added missing declaration. * match.c (enum fpip_close, struct fpip): Moved and named enum out of struct. * regex.c (L0_full): Cast added in signed/unsigned comparison. (L1_fill_range, L2_fill_range, L3_fill_range, char_set_create): Don't mark static blank structures const; then they need initializers in C++. (char_set_compl, char-set_destroy, char_set_contains, nfa_compile_set): Avoid using the C++ keyword "compl". * regex.h (struct any_char_set, struct small_char_set, struct displaced_char_set, struct large_char_set, struct xlarge_char_set): Renamed compl member to comp. * utf8.c (utf8_from_uc, ut8_decode): Obey stricter C++ rules with regard to enumerations. 2009-11-24 Kaz Kylheku Fixed broken yyerrorf. It was still taking char *, and passing that as an object to vformat, resulting in # output. * parser.h (yybadtoken): Declaration updated. * parser.l (yybadtoken): Redefined to take val argument. The tok stays as int; this is closely coupled to yacc, so why bother with num(). * parser.y (grammar): Fix occurences of yybadtoken to pass proper literal objects using the lit macro, or nil in the one case when there is no context. 2009-11-24 Kaz Kylheku Renaming global variables that denote symbols, such that they have a _s suffix. * lib.c (cons_t, str_t, chr_t, num_t, sym_t, pkg_t, fun_t, vec_t, stream_t, hash_t, lcons_t, lstr_t, cobj_t var, regex, set, cset, wild, oneplus zeroplus, optional, compound, or, quasi skip, trailer, block, next, freeform, fail, accept all, some, none, maybe, cases, collect, until, coll define, output, single, frst, lst, empty, repeat, rep flattn, forget, local, mrge, bind, cat, args try, catch, finally, nothrow, throw, defex error, type_error, internal_err, numeric_err, range_err query_error, file_error, process_error): Symbol globals renamed to cons_s, str_s, chr_s, num_s, sym_s, pkg_s, fun_s, vec_s, t, cons_s, str_s, chr_s, num_s, sym_s, pkg_s, fun_s, vec_s, stream_s, hash_s, lcons_s, lstr_s, cobj_s, var_s, regex_s, set_s, cset_s, wild_s, oneplus_s, zeroplus_s, optional_s, compound_s, or_s, quasi_s, skip_s, trailer_s, block_s, next_s, freeform_s, fail_s, accept_s, all_s, some_s, none_s, maybe_s, cases_s, collect_s, until_s, coll_s, define_s, output_s, single_s, first_s, last_s, empty_s, repeat_s, rep_s, flattn_s, forget_s, local_s, merge_s, bind_s, cat_s, args_s, try_s, catch_s, finally_s, nothrow_s, throw_s, defex_s, error_s, type_error_s, internal_error_s, numeric_error_s, range_error_s, query_error_s, file_error_s, process_error_s, (code2type, typeof, make_package, intern, obj_init): Symbols references follow rename. * lib.h (cons_t, str_t, chr_t, num_t, sym_t, pkg_t, fun_t, vec_t, stream_t, hash_t, lcons_t, lstr_t, cobj_t var, regex, set, cset, wild, oneplus zeroplus, optional, compound, or, quasi skip, trailer, block, next, freeform, fail, accept all, some, none, maybe, cases, collect, until, coll define, output, single, frst, lst, empty, repeat, rep flattn, forget, local, mrge, bind, cat, args try, catch, finally, nothrow, throw, defex error, type_error, internal_err, numeric_err, range_err query_error, file_error, process_error): Declarations updated. * hash.c (make_hash): Symbol references follow rename. * match.c (sem_error, file_err, dump_var, match_line, subst_vars, eval_form, complex_stream, extract_vars, do_output_line, do_output, match_files): Likewise. * parser.y (grammar, repeat_rep_helper, define_transform): Likewise. * regex.c (nfa_compile_set, nf_compile_regex, regex_compile, regexp, regex_nfa): Likewise. * stream.c (stdio_maybe_read_error, stdio_maybe_write_error, stdio_close, pipe_close, make_stdio_stream, make_pipe_stream, make_string_input_stream, make_string_byte_input_stream, make_string_output_stream, get_string_from_stream, make_dir_stream, close_stream, get_line, get_char, get_byte, vformat, format, put_string, put_char): Likewise. * txr.c (txr_main): Likewise. * unwind.c (uw_throw, uw_errorf, type_mismatch, uw_register_subtype, uw_init): Likewise. * unwind.h (internal_error, numeric_assert, range_bug_unless); Likewise. 2009-11-23 Kaz Kylheku * configure (platform_flags, remove_flags): New config variables. * Makefile (CFLAGS): Take into account new flags. 2009-11-23 Kaz Kylheku Follow up on 64 bit compilation warnings. * lib.c (chr, chrp): Do not convert directly between wchar_t and the pointer type; go through cnum intermediate value. * stream.c (vformat): Fix bad cast from pointer to int; this was missed in the conversion to cnum because it should have been a cast to long originally. 2009-11-23 Kaz Kylheku * Makefile (conftest.o): revert change that took CFLAGS from this target. 2009-11-23 Kaz Kylheku * configure: Don't rely on higher precision arithmetic from the build machine's shell. POSIX requires shell arithmetic to be only signed long. We can't compute the INT_PTR_MAX constant in the shell, but rather generate a constant C expression to compute it. 2009-11-23 Kaz Kylheku Reporting of compile errors during configuration for easier configure debugging. * Makefile (conftest): Pass all of the CFLAGS when building conftest. This way bad compiler options are caught right in the basic compiler sanity test. * configure: Compiler jobs are redirected to temporary error file conftest.err which is dumped if there is a failure. Parting text is improved: the user should not blindly trust the success of the configuration but check its sanity. 2009-11-23 Kaz Kylheku * configure: Bugfix in parsing configuration variables which contain the = character. * Makefile (conftest.o): Pass full CFLAGS to configuration test builds. If some flags don't work with the compiler, this should be caught. 2009-11-23 Kaz Kylheku * Makefile (CFLAGS): Added -I. so current directory is first in the include search path. This is needed for finding generated header files, when building in a separate directory. 2009-11-23 Kaz Kylheku * lib.c (chk_malloc, chk_realloc): Fix diagnosable conversion, caught by gcc 4.1.1. 2009-11-23 Kaz Kylheku * configure (cross): Print out value of $cross in --help. * depend.txr: Add "config.h" to list of headers that are not prefixed with $(top_srcdir). * dep.mk: Regenerated. 2009-11-23 Kaz Kylheku Improving portability. It is no longer assumed that pointers can be converted to a type long and vice versa. The configure script tries to detect the appropriate type to use. Also, some run-time checking is performed in the streams module to detect which conversions specifier strings to use for printing numbers. * Makefile (conftest, conftest.o, conftest.syms): New targets. Used by the configure script. * configure (intptr, nm): New configuration variables. Generating config.make is no longer the last step; compiler tests are performed after config.make is set up, so that rules in the Makefile can be used for doing the compiling. (This is the cleanest way to do it, since the paths to the tools may contain Make variable expansion syntax). New steps are added to try to detect whether the compiler has a wider integer type than the c89 long, and which of the available types (including, potentially, the extra wide type) is suitable for holding a pointer. Results are generated into a header config.h. * dep.mk: Regenerated. * lib.h (NUM_MAX, NUM_MIN): Now derived from INT_PTR_MAX and INT_PTR_MIN macros, which come from config.h. (cnum): New typedef name. (cobj ops, tag, auto_str, static_str, litptr, lit_noex): Changed long to cnum. (num, c_num): Declaration updated. * lib.c (equal, length, num, c_num, plus, minus, neg, search_str, cat_str, vector, vec_set_fill, obj_print, obj_pprint): Changed long to cnum. * gc.c (mark_obj): Changed long to cnum. * hash.c (stuct hash, ll_hash, hash_mark, hash_grow, hash_process_weak): Changed long to cnum. * match.c (complex_open, do_output_line, do_output, match_files): Changed long to cnum. * parser.h (lineno): Declaration updated. * parser.l (lineno): Redefined as cnum. (grammar): Changed long to cnum. * parser.y (%union/yystype): num member changed to cnum. of config.h added. * regex.c (nfa_run, nfa_machine_match_span, search_regex): Changed long to cnum. * regex.h (struct nfa_machine): Members last_accept_pos and count changed to cnum. (nfa_run, nfa_machine_match_span): Declarations updated. * stream.c (struct fmt): New type. (fmt_tab): New static array. (num_fmt): New static pointer. (detect_format_string): New function. (vformat): Changed long to cnum. Formatting of numbers uses num_fmt. (stream_init): Call detect_format_string. * txr.c, unwind.c, utf8.c: include config.h. * unwind.h (internal_error): Local declaration of num updated. 2009-11-21 Kaz Kylheku Introducing symbol packages. Internal symbols are now in a system package instead of being hacked with the $ prefix. Keyword symbols are provided. In the matcher, evaluation is tightened up. Keywords, nil and t are not bindeable, and errors are thrown if attempts are made to bind them. Destructuring in dest_bind is strict in the number of items. String streams are exploited to print bindings to objects that are not strings or characters. Numerous bugfixes. * lib.h (enum type, type_t): new member: PKG. (struct stym): New member: package. (struct package): New type. (union obj, obj_t): New member pk. (interned_syms): Declaration removed. (keyword_package, pkg_t): Declared. (intern, acons_new_l): Declarations updated. (find_package, symbol_package, keywordp): Declared. * lib.c (interned_syms): Definition removed. (packages, pkg_t, system_package, keyword_package, user_package): New global variables. (code2type, equal, obj_pprint): Handle PKG case. (symbol_package, make_package, find_package, keywordp): New functions. (make_sym): Initialize package field of symbol. (intern): Takes package argument. Rewritten using packages, which use hash tables to store symbols. (acons_new_l): Takes extra pointer argument to return an extra value. (obj_init): Updated to handle packages. The orders of some initializations have to change. The way nil is added as a symbol is quite different, and a special hack for the symbol t is used. Most symbols go into the user_package, but symbols that were previously namespaced with $ go to the system package. (obj_print): SYM cases now considers the packge of a symbol. Symbols in the user package are printed as before. Symbols with no package are printed using #: notation; keywords with : notation; and all others with their package prefix. PKG case is handled. * gc.c (finalize): Handle PKG case. (mark_obj): For SYM, mark the new package member. Handle PKG case. * hash.h (gethash_l): Declaration updated. * hash.c (ll_hash): Handle PKG case. (gethash_l): Extra argument added to distinguish new addition from existing find. * match.c (dump_var): Dumps any object now by printing to a string with a string stream. (bindable): New function. (dest_bind): Tightened up to distinguish bindable symbols from non-bindable. Symbols that stand for themselves, including nil, can only match themselves. Destructuring matches have to match in the number of elements: dot notation can be used to match superfluous elements. (eval_form): Tightened up to recognize bindable symbols. (match_files): Various directives honor non-bindable symbols (cat, merge, flatten). * parser.l (yybadtoken): Handle KEYWORD case. (grammar): TOK can start with : . Returned as KEYWORD terminal, with a lexeme that no longer has the : character. * parser.y (KEYWORD): New nonterminal. (grammar): Calls to intern given extra parameter. In the expr rule, KEYWORD turned into symbol in keyword package. * regex.c (regexp): Bugfix: dereferencing non pointer. * stream.c (vformat): Bugfixes in state machine: handling of prefix digits; printing of numbers in ~s. * txr.c (txr_main): Intern calls updated. * txr.1: Updated with information about nil, t and keywords. More details about destructuring matching in bind. 2009-11-20 Kaz Kylheku * unwind.c (uw_throw): If streams are not initialized, we have an unhandled exception too early in initialization. Use C stream to print an error message and abort. Using the nil stream variable will just cause a recursion bomb. 2009-11-20 Kaz Kylheku * lib.c (intern): Symbol interning to hash tables. (obj_init): interned_syms must be created as a hash table. Rearranged the order of some initializations so the vector code called by hash works. 2009-11-20 Kaz Kylheku * lib.c (dest_bind): Fix breakage from two commits ago; was falling through to unsuccessful return in the consp case. 2009-11-20 Kaz Kylheku * parser.y (grammar): Fix error actions that do not assign a value to $$. 2009-11-20 Kaz Kylheku * match.c (dest_bind): Extended to handle more general forms by using eval_form rather than direct symbol binding lookups. False positive return fixed. (match_line): Fixed merge to use eval_from rather than direct symbol binding. 2009-11-20 Kaz Kylheku * lib.c (flatten): Semantics change. The flatten function should not map nil -> (nil), but nil -> nil. 2009-11-20 Kaz Kylheku Changing ``obj_t *'' occurences to a ``val'' typedef. (Ideally, we wouldn't have to declare object variables at all, so why use an obtuse syntax to do so?) * lib.h (val): New typedef name. Used throughout. * gc.c, gc.h, hash.c, hash.h, lib.c, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h stream.c,, stream.h, txr.c, unwind.c, unwind.h: Replace obj_t * with val almost everywhere. Low-level gc functinos that work with arrays of obj_t use obj_t *. Seeing pointer arithmetic on a val doesn't make sense. In macros we use obj_t *, to reduce the chances of clashing with some local variable called val. 2009-11-19 Kaz Kylheku * txr.1: Fixed mangled formatting of exception handling example. 2009-11-19 Kaz Kylheku Get rid of macros in favor of safer inline functions. The recent auto_str("byte str") error could have been caught at compile time. * Makefile (CFLAGS): Include expansion of INLINE_FLAGS. * configure (inline_flags): New variable. (INLINE_FLAGS): New variable generated in config.make. * lib.h (tag, is_ptr, is_num, is_chr, is_lit, type, auto_str, static_str, litptr): Function-like macros converted to functions. 2009-11-19 Kaz Kylheku Version 024 Fixed show-stopper breakage in parse error diagnostic function. Fixed bug introduced back in 015: collects that don't yield any variable bindings being wrongly treated as failed. * txr.c (version): Bumped to 024. * txr.1: Bumped version to 024. 2009-11-19 Kaz Kylheku Use unsigned char * as allocator return value. * lib.c (chk_malloc, chk_realloc): Return unsigned char *. * lib.c (chk_malloc, chk_realloc): Declarations updated. * utf8 (utf8_dup_to_uc): Remove cast to unsigned char *. 2009-11-18 Kaz Kylheku Following-up on diagnostics obtained by running code through C++ compiler. Idea: allocator functions return char * instead of void *, like malloc did in classic pre-ANSI C. That way we are forced to use a cast except when the target pointer is char * already. * lib.c (progname): Duplicate definition of global removed. (equal): Some default: cases to switch statements added. (chk_malloc): Returns char *. (chk_realloc): Returns char *, but takes void * on the way in. That way we get C++-like behavior. (chk_strdup): Oops, this returned void * instead of wchar_t *. c++ catches boo boo. (stringp): Added default: case to switch. (vec_set_fill): Cast return value of chk_realloc. * lib.h (chk_malloc, chk_realloc, chk_strdup): Declarations updated. * parser.h (lineno): extern qualifier added to prevent duplicate definitions of. * regex.c (nfa_free, nfa_run, nfa_machine_init, regex_compile): Cast return value of chk_malloc. * stream.c (snarf_line, get_string_from_stream): Cast return value of chk_realloc. 2009-11-18 Kaz Kylheku * match.c (match_line, match_files): Fix broken behavior of collect that doesn't match anything. It is not a failed match, as the documentation makes perfectly clear. Collect/coll were introduced in txr-006 and had the proper non-failing semantics. However, in txr-015, during code restructuring, a bug crept in. When changing to a different debugiging function, for some reason I added the nil returns. 2009-11-18 Kaz Kylheku * parser.l (yyerror): Total breakage: can't take auto_str of char * string. (yyerrorf): Total breakage: arguments of wrong types. Detected by vformat as garbage. 2009-11-18 Kaz Kylheku txr.1: Clarified handling of UTF-8, now that it's precise and portable. 2009-11-18 Kaz Kylheku Version 023 Minor bugfix. Code cleanup. Portability. Completely removed dependency on C99 wide character stream functions, and character encoding support from glibc. All UTF-8 encoding and decoding is done by the program itself. Removed the use of all GNU extensions and C99 syntax. txr now requires a C90 compiler, and POSIX 1003.1 and 1003.2. * txr.c (version): Bumped to 023. * txr.1: Bumped version to 023. 2009-11-17 Kaz Kylheku More removal of C99 wide character I/O, and tightening up of standard conformance. * configure (lang_flags): Specify -D_POSIX_C_SOURCE=2 to obtain POSIX 1003.1 and POSIX 1003.2 functions from the headers, without GNU extensions. Specify -std=c89 to get C89 conformance from gcc. * match.c (dump_byte_string): New function. (dump_shell_string): Retargetted to object streams. (dump_var, dump_bindings): Retargetted to object streams. Changed back to using a byte string for the array index prefixes, to avoid using the wide-character swprintf. * parser.l (grammar): Eliminate wcsdup uses in favor of chk_strdup. Not only is wcsdup a GNU extension, it doesn't have the OOM check. * stream.c: Added header to define WIFEXITED and others. * txr.c: Added include of . Removed , (main): Removed setlocale call. Not needed any more, since wide stream and string I/O is no longer used from the C library. 2009-11-17 Kaz Kylheku Removing use of C99 wide character I/O. * stream.c (BROKEN_POPEN_GETWC): Macro removed. Work around no longer needed since the program does not call getwc. (struct stdio_handle): #ifdef text removed. New member added: utf8 decoder. (stdio_maybe_read_error, stdio_maybe_write_error): Treat null handle as an exception rather than nil return. No need to check ferror in stdio_maybe_write_error, since there is no need to distinguish an end-of-file situation from error. (stdio_put_char_callback, stdio_get_char_callback): New functions. (stdio_put_string, stdio_put_char): Retargetted to utf8 encoder. Null handle treated as separate kind of error. (snarf_line, stdio_get_line, stdio_get_char): Retargetted to utf8 decoder. (pipe_close): #ifdef text removed. (make_stdio_stream): utf8 decode initialized. (make_pipe_stream): utf8 decoder initialized. #ifdef text removed. 2009-11-17 Kaz Kylheku Warning fixes. * hash.c (hash_ops): Add missing initializer. * match.c (complex_open): Add missing intializer to ret variable. * regex.c (regex_obj_ops): Add missing initializer. * stream.c (stdio_ops, pipe_ops, string_in_ops, byte_in_ops, string_out_ops, dir_ops): Likewise. 2009-11-17 Kaz Kylheku * lib.c (chrp): Fix broken is_chr(num) call. 2009-11-17 Kaz Kylheku * regex.c (nfa_all_states, nfa_closure): visited parameter should be unsigned. 2009-11-17 Kaz Kylheku Fixes for compliance to C89. * lib.c (init): Do not define variable after statements. * match.c (match_files): Likewise. * regex.h (struct any_char_set, struct small_char_set, struct displaced_char_set, struct large_char_set, struct xlarge_char_set): do not use enum bitfields, which is a GCC extension. * unwind.h (enum uw_frtype, uw_frtype_t): Combine into one declartion, eliminating forward enum reference which is a GCC extension. (uw_block_begin): Add dummy typedef to macro so that it requires a following semicolon. Without this, if the macro use is followed by a semicolon, that semicolon looks like a null statement. A subsequent declaration thus follows a statement and is not conforming to C89. Also added an opening do. (uw_block_end): Add while(0) to match do in uw_block_begin. (uw_env_begin, uw_env_end): Add do/while(0) to macro pair, so uw_env_end reuqires a semicolon. (uw_catch_begin, uw_catch_end): Likewise. 2009-11-17 Kaz Kylheku Version 022 Fix for bug 28033: crash in string output stream. (Used by exception handling). New kernel object type introduced which allows C string literals to be used as first-class objects. Use of printf-like C formatting eliminated from the code base. The dependency on C99 wide character I/O is now minimized. * txr.c (version): Bumped to 022. * txr.1: Bumped version to 022. 2009-11-16 Kaz Kylheku * Makefile (rebuild): New target. Tired of doing make clean; make. 2009-11-16 Kaz Kylheku Big round of changes to switch the code base to use the stream abstraction instead of directly using C standard I/O, to eliminate most uses of C formatted I/O, and fix numerous bugs, such variadic argument lists which lack a terminating ``nao'' sentinel. Bug 28033 is addressed by this patch, since streams no longer provide printf-compatible formatting. The native formatter is extended with some additional capabilities to take over. The work on literal objects is expanded and they are now used throughout the code base. Fixed bad realloc in string output stream: reallocating by number of wide chars rather than bytes. * gc.c (sweep): Debugging code switched from fprintf to format. * lib.c (typ_check, type_check2, car, cdr, car_l, cdr_l, list, num, chrp, apply, cobj_print_op, dump): Retargetted, with help of new literals, to new funtions that take string objects, rather than raw C strings. (obj_print, obj_pprint): Revamped with support for LIT type. Retargetted to not use C style I/O functions in streams. * lib.h (lit): Macro retargetted to another macro so that it expands its argument. (lit_noex): New macro, like lit, but does not macro-expand argument. (auto_str): New macro. (static_str): New macro. * match.c (debugf, debuglf, sem_error, file_err): Converted from C string to string object. (dest_bind, match_line, LOG_MISMATCH, LOG_MATCH, match_files): Retargetted to new interfaces that take string objects rather than raw C strings. (complex_stream): New function. (do_output_line, do_output, extract): Retargetted from C streams to object streams. * parser.h (yyerrorf): Declaration updated. * parser.l (yyerror): Call new yyerrorf interface, using auto_str macro to dress up C string as a temporary object. (yyerrorf): Changed from C strings to object strings. (yybadtoken, grammar): Retargetted to new yyerrorf. * stream.c (strm_ops): put_string and put_char function pointers changed to take object strings rather than C strings. vcformat and vformat virtuals removed. C formatting is not supported, and vformat is handled above the stream switch level in one place for all streams. (common_vformat, stdio_vcformat, string_out_vcformat, cformat, put_cstring, put_cchar): Functions removed. (stdio_stream_print, stdio_stream_destroy, stdio_maybe_write_error, stdio_put_string, stdio_put_char, stdio_close, pipe_close, string_out_put_char, make_pipe_stream, make_string_input_stream, make_string_output_stream, make_dir_stream, close_stream, get_line, put_line, get_char, put_char, put_string): Retargetted to new string object interfaces. (stdio_ops, pipe_ops): stdio_vcformat and common_vcformat initializers (string_out_ops): string_out_vcformat and common_vcformat initializers removed. (string_in_ops, byte_in_ops, dir_ops): Two null initializers removed. (string_out_put_string): Converted to object string interface. Unnecessary chk_realloc call suppressed. (get_string_from_stream): Fixed bad call to realloc with incorrect size. (vformat_num, vformat_str): New functions, helper to vformat. (vformat): Rewritten. Is now the formatting engine. (format, put_string, put_char): Interface converted from C string to object string. * stream.h (vformat, format): Declarations updated. (vcformat, cformat, put_cstring, put_cchar): Declarations removed. * txr.c (oom_realloc_handler, help, txr_main): Retargetted to object stream and strings. * unwind.c (uw_throw, type_mismatch, uw_register_subtype): Retargetted to new string object interfaces. (uw_throwf, uw_errorf): Interface changed from C string to object string. (uw_throwcf, uw_errorcf): Functions removed. * unwind.h (uw_throwf, uw_errorf, type_mismatch): Declarations updated. (uw_throwcf, uw_errorcf): Declarations removed. (internal_error): Macro interface changed and retargetted to object strings. Also, num hygiene problem worked around with local extern declaration. (numeric_assert, range_bug_unless): Retargetted to object strings. * utf8.c (utf8_to, utf8_dup_from_uc, utf8_dup_from, utf8_dup_to_uc): Casts of chk_malloc return value added. 2009-11-15 Kaz Kylheku Use the 11 tag bit pattern to denote a new type: LIT. This is a pointer to a C static string, intended for literals. We can now treat literal strings as light-weight objects. * lib.h (TAG_MASK): Ensure the constant expr has long type. (TAG_LIT): New macro. (enum type, type_t): New enum member, LIT. * gc.c (finalize, mark_obj): Handle LIT type. * hash.c (ll_hash): Likewise. * lib.c (code2type, equal, stringp, length_str, c_str, obj_print): Likewise. (obj_init): Intern symbols using literal strings. (type): Parentheses added to macro expansion. (is_lit, lit, litptr): New macros. 2009-11-15 Kaz Kylheku * lib.c (chr): Take wchar_t argument, not int. Dropped range check. (c_chr): Return wchar_t not int. * lib.h (chr, c_chr): Declarations updated. 2009-11-15 Kaz Kylheku Version 021. Text is represented using wide characters now. Queries and data are parsed as UTF-8, so extended characters can be directly used. Numeric character escapes can go up to \x10FFF. (More limited on platforms where wchar_t is 16 bit). Regular expressions support extended characters, directly or through escapes. Regex character set matches can use full Unicode range. New test case 005 exercises some of these features over Japanese text. Failed exit status of pipes, and file close errors are exceptions now. Bug fixed in regex character classes. Fixed off-by-one error in lazy string implementation, which broke some uses of the @(freeform) directive. Fixed all instances of gc bug 28086: objects being prematurely reclaimed. This showed up when compiling for profiling (gcc -pg). The --cc argument of the configure script works properly now. Numbers and characters are unboxed types now, encoded directly in the (obj_t *) value. Lowest two bits of (obj_t *) are a tag distinguishing characters, integers and pointers. The program performs better from not having to cons memory when operating on numbers and characters. Discovered bug in glibc: getwc function segfaults when applied to stream returned by popen. Worked around this bug. Bug is filed as 10958 in glibc bugzilla. Internals: Hash tables implemented. Hash tables support weak keys and values. * configure, hash.c, lib.c, stream.c, utf8.c: Removed trailing from some lines. * txr.c (version): Bumped to 021. Removed trailing whitespace. * txr.1: Bumped version to 021. 2009-11-14 Kaz Kylheku Provide both char * and unsigned char * interfaces in UTF-8 module. Fix unsigned and plan char * mixing. * utf8.c (utf8_from_uc, utf8_to_uc, utf8_dup_from_uc, utf8_dup_to_uc): New functions. (utf8_from): Fix type of backtrack pointer to unsigned char *. * utf8.h (utf8_from_uc, utf8_to_uc, utf8_dup_from_uc, utf8_dup_to_uc): Declared. * lib.c (string_utf8): Changed to take char * argument. * lib.h (string_utf8): Declaration updated. 2009-11-14 Kaz Kylheku * Makefile (depend): Marked phony and $(PROG) prerequisite dropped. (clean, distclean, tests, install): Phony targets marked phony. * dep.mk: Regenerated. 2009-11-14 Kaz Kylheku * configure (cc): Compute variable properly. 2009-11-14 Kaz Kylheku Fixes for bug 28086. When constructing a cobj, whose associated C structure contains obj_t * references, we should initialize that C structure after allocating the cobj. If we initialize the structure first, it may end up having the /only/ references to the objects. In that case, the objects are invisible to the garbage collector. The subsquent allocation of the cobj itself then may invoke gc which will turn these objects into dust. The result is a cobj which contains a handle structure that contains references to free objects. The fix is to allocate the handle structure, then the cobj which is associated with that handle, and then initialize the handle, at which point it is okay if the handle has the only references to some objects. Care must be taken not to let a cobj escape with a partially initialized handle structure, and not to trigger gc between allocating the cobj, and initializing the fields. * hash.c (make_hash): Fix cobj construction order. * stream.c (make_stdio_stream): Fix cobj construction order. (make_pipe_stream): Fix cobj construction order. Also noticed and fixed a bug: h->descr field not being initialized in the currently enabled BROKEN_POPEN_GETWC variant of the code. 2009-11-13 Kaz Kylheku New testcase which does some UTF-8 scanning, Unicode regexes, and @(freeform). * tests/005/data: New UTF-8 file. * tests/005/query-1.txr: Likewise. * tests/005/query-1.expected: Likewise. * Makefile (TXR_ARGS): New target-specific assignment to set data for test case set 005. 2009-11-13 Kaz Kylheku * lib.c (symbolp): Bugfix: function crashed on NUM argument. (lazy_str): Fix for gc correctness: object from make_obj must be completely intialized before any gc-triggering operation is invoked, otherwise the garbage collector will be traversing an object whose fields contain old garbage. (lazy_str_force_upto): Off-by-one error. To force the object up to index position N, means forcing up to length N+1. This bug can make it look like a lazy string is much shorter than it really is. 2009-11-13 Kaz Kylheku Allow -c scripts to not have a trailing newline. Test suite exercises -c now. txr.c (txr_main): If the script specified with -c is not terminated by a newline, just add a newline. On the shell command line, it's a nuisance to have to add the extra line before closing the quote. It's also awkward in scripting, because the shell (or at least Bash 3.0) does not produce a final terminating newline in command substitution syntax like -c "$(cat file)". The last newline in the file is trimmed, and has to be explicitly added in the script itself, which is wrong in the case when the file is empty. Makefile (TXR_SCRIPT_ON_CMDLINE): New target-specific variable, arbitarily set for test 002. (%.ok: %.txr): Rule updated to honor TXR_SCRIPT_ON_CMDLINE variable, passing the script body to txr using -c rather than as a file argument. txr.1: Document -c change. 2009-11-13 Kaz Kylheku Previous commit broke UTF-8 lexing, by changing the get_char semantics on the input stream to wide character input. Also, reading a query the command line (-c) must read bytes from a UTF-8 encoding of the string. We introduce a new get_byte function which can extract bytes from streams which provide it. * parser.l (YYINPUT): Call get_byte instead of get_char. * stream.c (struct strm_ops): New function pointer, get_byte. (stdio_get_byte): New function. (stdio_ops, pipe_ops): Add new function. (string_in_ops, string_out_ops, dir_ops): Null pointer added. (struct byte_input): New struct type. (byte_in_get_byte): New function. (byte_in_ops): New structure. (make_string_byte_input_stream, get_byte): New functions. * stream.h (make_string_byte_input_stream, get_byte): New functions. * txr.c (txr_main): Make a byte input stream from the command line spec, rather than a string input stream. 2009-11-12 Kaz Kylheku Continuing wchar_t conversion. Making sure all stdio calls use wide character functions so that there is no illicit mixing. (But the goal is to replace this usage with txr streams). * lib.c (list, cobj_print_op, obj_print, obj_pprint): Use wide string literals and I/O functions. * match.c (debuglcf): Converted to wchar_t. (dump_bindings, match_line, match_lines, extract): Use wide string literal and I/O function. * parser.h (yyerrorf): Declaration updated. * parser.l (yyerror): Use wide-string yyerrorf. Users of yyerror continue to pass multibyte strings. (yyerrorf): Converted to wchar_t. (yybadtoken, grammar): Use wide string literals to call yyerrorf. * stream.c (struct strm_ops): vcformat changed to wchar_t. (stdio_vcformat, string_out_vcformat, vcformat, cformat): Likewise. * stream.h (vformat, vcformat, cformat): Declarations updated. * txr.c (oom_realloc_handler, help, hint, txr_main): Use wide string literals and I/O functions. * unwind.c (uw_throwcf, uw_errorcf): Converted to wchar_t. * unwind.h (uw_throwcf, uw_errorcf): Declarations updated. (internal_error, numeric_assert, range_bug_unless): Macros fixed to use wide string literals. 2009-11-12 Kaz Kylheku * utf8.c (utf8_from): Fix total breakage. Was writing out incomplete wide characters on internal state transtions while traversing a single multi-byte character. Also, improved handling of bad bytes close to EOF: if EOF occurs in a multi-byte character, it will backtrack, and skip one bad byte, etc. (utf8_encode, utf8_decoder_init, utf8_decode): New functions. * utf8.h (enum utf8_state): New enum. (struct utf8_decoder, utf8_decoder_t): New struct. (utf8_encode, utf8_decoder_init, utf8_decode): Declared. 2009-11-12 Kaz Kylheku Documenting extended characters in man page. Cleaned up some more issues related to extended characters. * parser.l (grammar): Added error sctions for invalid UTF-8 bytes. * stream.c (BROKEN_POPEN_GETWC): New macro. Enables workaround for a glibc bug, whereby getwc blows up when applied to a FILE * stream returned from a popen call. (struct strm_ops): put_char function takes wchar_t. (common_format): Use wchar_t rather than int. (stdio_put_string): fputws returns -1, not EOF. (stdio_put_char, put_cchar): Character argument changed to wchar_t. Output done with putwc used instead of putc. (snarf_line, stdio_get_char): Use getwc to read from the stream. (pipe_close, make_pipe_stream): Implement workaround form glibc bug. * stream.h (put_cchar): Declaration updated. * txr.1: Added notes about international characters. 2009-11-12 Kaz Kylheku Regular expression module updated to do unicode character sets. Most of the changes are in the area of representing sets. Also, a bug was found in the compilation of regex character sets: ranges straddling two adjacent blocks of 32 characters were not being added to the character set. However, ranges falling within a single 32 block, or spanning three or more such blocks, worked properly. This bug is not tickled by common ranges such as A-Z, or 0-9, which land within a 32 block. * regex.h (BITCELL_LIT): Macro removed. (CHAR_SET_SIZE): Macro does not depend on UCHAR_MAX any more, but hard-codes a set size of 256. UCHAR_MAX means nothing to us any more since we are using wchar_t. The number 256 is simply an arbitrarily chosen size for representing the small character sets (or the leaves of the radix tree for representing large sets). (chset_type_t): New enum typedef. (cset_L0_t, cset_L1_t, cset_L2_t, cset_L3_t): New array typedefs. (struct char_set): Replaced by union char_set. (struct any_char_set, struct small_char_set, struct displaced_char_set, struct large_char_set, struct xlarge_char_set): New struct types. (char_set_clear): Declaration removed. (char_set_create, char_set_destroy): Declared. (char_set_add, char_set_add_range, char_set_contains, nfa_state_single, nfa_state_set, nfa_machine_feed): Declarations updated for wchar_t. (struct nfa_state_single): member ch changed to wchar_t. * regex.c (char_set_clear): Function removed. (CHAR_SET_L0, CHAR_SET_L1, CHAR_SET_L2, CHAR_SET_L3, CHAR_SET_L2_L0, CHAR_SET_L2_HI, CHAR_SET_L1_L0, CHAR_SET_L1_HI, CHAR_SET_L0_L0, CHAR_SET_L0_HI): New macros. (L0_full, L0_fill_range, L0_contains, L1_full, L1_fill_range, L1_contains, L1_free, L2_full, L2_fill_range, L2_contains, L2_free, L3_fill_range, L3_contains, char_set_create, char_set_destroy): New functions. (char_set_compl): Works using a flag rather than by actually computing a complemented set. Also, is no longer a toggle (and was never used that way). (char_set_add, char_set_add_range, char_set_contains): Polymorphic over the different set types. (nfa_state_single, nfa_move, nfa_run, nfa_machine_feed): Converted to wchar_t. (nfa_state_free): Use char_set_destroy to free set. (nfa_state_set): Does not construct the set internally but takes it as a parameter. (nfa_compile_set): Rewritten to perform two passes over the s-expression representing the list of characters and ranges making up the set. The first pass determines what representation will be used for the set. The second pass stuffs the characters and ranges into the set. 2009-11-11 Kaz Kylheku * txr.c (main): call setlocale to set the LC_CTYPE to en_US.UTF-8, so that the C library streams do the encoding. Once the program is weaned from C library wide character stream I/O, this can go away. 2009-11-11 Kaz Kylheku Big conversion to wide characters and UTF-8 support. This is incomplete. There are too many dependencies on wide character support from the C stream I/O library. The regex code does not handle wide characters properly. Character type is still int in some places, rather than wchar_t. Test suite passes though. * hash.c (hash_str): Converted to wchar_t. * lib.c (progname, type_check, type_check2, type_check3, car, cdr, car_l, cdr_l, equal, chk_strdup, string_own, string, mkstring, mkustring, init_str, length_str, c_str, search_str, sub_str, cat_str, split_str, trim_str, chrp, apply, lazy_str, lazy_str_get_trailing_list, cobj, obj_init, obj_print, obj_pprint, init): Converted to wchar_t. (vector): Cast of chk_malloc return value added. (string_utf8): New function. * lib.h (struct string): Member str changed to wchar_t *. (progname, chk_strdup, string_own, string, init_str, c_str, init): Declarations updated. (string_utf8): Declared. * match.c (debugf, debuglf, sem_error, file_err, dump_shell_string, dump_var, dump_bindings, dest_bind, match_line, do_output_line, do_output, match_files): Converted to wchar_t. * parser.h (spec_file): Declaration updated. * parser.l (yy_errorf, char_esc, num_esc): Converted to wchar_t. (ASC, ASCN, U, U2, U3, U4, UANY, UNANN, UONLY): New named regexes, used for lexing utf-8. (grammar): Converted to wchar_t and utf-8 handling. * parser.y (%union/yystype): lexeme member changed to wchar_t *, chr member changed to wchar_t. * regex.c (nfa_run): Input string is wchar_t *. (search_regex): String from haystack is wchar_t *. * regex.h (nfa_run): Declaration updated. * stream.c (struct strm_ops, common_vformat, stdio_stream_print, stdio_maybe_read_error, stdio_maybe_write_error, stdio_put_string, stdio_put_char, snarf_line, stdio_get_line, stdio_close, pipe_close, struct string_output, string_out_put_string, string_out_put_char, string_out_vcformat, dir_get_line, make_string_output_stream, get_string_from-stream, make_dir_stream, get_line, get_char, vformat, vcformat, format, cformat, put_string, put_cstring, put_char, put_cchar, stream_init): Converted to wchar_t. * stream.h (vformat, format, put_cstring): Declarations updated. * txr.c (version, progname, spec_file, oom_realloc_handler, help, hint, remove_hash_bang_line, main, txr_main): Converted to wchar_t. * txr.h (version, progname): Declarations updated. * unwind.c (uw_throw, uw_throwf, uw_errorf, type_mismatch, uw_register_subtype): Converted to wchar_t. * unwind.h (uw_throwf, uw_errorf, type_mismatch): Declarations updated. * utf8.c, utf8.h: New files. 2009-11-10 Kaz Kylheku hash.c (hash_grow): Rewritten to avoid resizing the vector in place, and thus having to pulling all conses into a big list. TODO: avoid recomputing the hash function over the keys. We could enhance cons cells with two more fields without using additional storage. 2009-11-06 Kaz Kylheku Changing representation of objects to allow for unboxed characters. Now numbers and characters fit into a cell. We lose one more bit from the range of numbers. * lib.h (TAG_SHIFT, TAG_MASK, TAG_NUM, TAG_PTR, NUM_MASK, NUM_MIN, is_ptr, is_num): Macros updated. (is_chr, tag): New macros. (struct chr): Removed. (union obj): Updated. * lib.c (typeof, equal, chr, chrp, c_chr, obj_print): Updated. * hash.c (ll_hash): Characters aren't pointers any longer; use abstract accessor. 2009-11-06 Kaz Kylheku Add hash removal. * hash.c (remhash): New function. * hash.h (remhash): Declared. 2009-11-06 Kaz Kylheku Add hash table growth. hash.c (hash_grow): New function. (l_gethash): Renamed to gethash_l. Increment count; if load factor gets to two, call hash_grow to double the size. hash.h (l_gethash): Declaration changed to gethash_l. 2009-11-06 Kaz Kylheku Changing representation of objects to allow the NUM type to be unboxed. If the lowest bit of the obj_t * pointer is 1, then the remaining bits are a number. A lot of assumptions are made: - the long type can be converted to and from a pointer - two's complement. - behavior of << and >> operators when the sign bit is involved. * lib.h (TAG_SHIFT, TAG_MASK, TAG_NUM, TAG_PTR, NUM_MASK, NUM_MIN, is_ptr, is_num, type): New macros. (struct num): Removed. (nao): Redefined, so that it doesn't have the numeric tag. * lib.c (typeof, type_check2, type_check3, car, car_l, cdr, cdr_l, equal, consp, atom, listp, num, c_num, nump, plus, minus, stringp, lazy_stringp, obj_print, obj_pprint): Fixed these functions to use the new number representation, and not to deference the obj_t * poitner if it is actually a number. (obj_init): Adjusted values of maxint and minint. * gc.c (mark_obj, gc_is_reachable): Avoid dereferencing numbers. * hash.c (ll_hash): Likewise. * match.c (match_line, do_output_line): Likewise. 2009-11-06 Kaz Kylheku First cut at hash tables. One known problem is allocation during gc, due to use of boxed numbers for vector access. * gc.c (gc): Disable gc when doing garbage collection, in case something tries to allocate memory during gc, triggering a recursive gc, which would be very bad. Also, call the new function, hash_process_weak, in between the mark and sweep phases. (gc_is_reachable): New function. * gc.h (gc_is_reachable): Declared. * lib.c (hash_t): New symbol global. (acons_new_l): New function. (obj_init): New symbol interned. * lib.h (hash_t, acons_new_l): Declared. * hash.c, hash.h: New files. * Makefile: New target, hash.o. * dep.mk: Regenerated. 2009-11-06 Kaz Kylheku Throw exception on stream error during close, or I/O operations. This is needed for pipes that terminate abnormally or return failed termination. Pipe and stdio streams have an extra description field so they are printed in a readable way. * lib.c (process_error): New global defined. (obj_init): New symbol interned. (lazy_stream_func): Pass t to close_stream, so exception is thrown if the close fails. (lazy_stream_cons): Ditto. * lib.h (process_error): Declared. * match.c (complex_snarf): Pass new desr argument to make_stdio_stream and make_pipe_stream. * stream.c (strm_ops): New argument on close function pointer. (common_destroy): Close without throwing exception. For objects being finalized, we don't care if the close works or not; the program has shown that it doesn't care about the stream by letting it become unreachable, so we don't bother the program by throwing an exception. (stdio_handle): New struct. (stdio_stream_print, stdio_stream_destroy, stdio_stream_mark, stdio_maybe_read_error, stdio_maybe_write_error): New functions. (stdio_put_string, stdio_put_char, stdio_get_line, stdio_get_char, stdio_vcformat, stdio_close): Updated to new handle format, and throw errors now. (stdio_ops, pipe_ops): Redirected to new functions stdio_stream_print, stdio_stream_destroy and stdio_stream_mark. (pipe_close): Updated to new handle format. Parses status from pclose and throws exceptions appropriate to the situation. (dir_close): Takes extra argument. (make_stdio_stream, make_pipe_stream): New argument added. (make_string_output_stream): Some casts added. (close_stream): Pass new argument down to virtual function. (stream_init): Pass new argument to make_stdio_stream when creating streams for stdin, stdout and stderr. * stream.h (make_stdio_stream, make_pipe_stream, close_stream): Declarations updated. * txr.c (txr_main): Pass new argument to make_stdio_stream. * unwind.c (uw_init): Register process_error. 2009-11-01 Kaz Kylheku Version 020 Improved documentation. Building via configure script. Support for cross compiling support. Support for building in separate build directory. Internal bugfixes. Portability bugs fixed; works on x86-64 GNU/Linux. 2009-11-01 Kaz Kylheku Bug ID 27898: Directory order dependencies in test case. Converted some directories to text files. * tests/002/proc/*/task: Directories removed. * tests/002/proc/*/tasks: Files created. * tests/002/query-1.txr: Query updated. * tests/002/query-1.expected: Expected output updated. 2009-11-01 Kaz Kylheku Bug ID 27895: Calls to protect have an argument list terminated by the integer constant 0 rather than a proper null pointer constant. lib.c (obj_init): Properly terminate argument list of protect call. stream.c (stream_init): Likewise. unwind.c (unwind_init): Likewise. txr.c (txr_main): two-argument protect calls rewritten using prot1. 2009-11-01 Kaz Kylheku Bug ID 27899: Garbage collection problem: method of locating stack bottom is unreliable due to the unpredictable allocation order of local variables. The addresses of stack_bottom_0 and stack_bottom_1 variables do not necessarily bracket the others which means that some local variables in main can be out of the reach of the garbage collector: our stack bottom is wrongly in the middle of the frame. * lib.c (init): Removed one of the stack bottom parameters, so there is only one. This is passed straight down to gc_init. Also noticed that the oom_realloc variable was not being set from the oom parameter. * lib.h (init): Declaration updated. * txr.c (txr_main): New static function. (main): Calls init, and then txr_main. The idea is that txr_main should get fresh stack frame. So the stack_bottom variable in main should be outside of that stack frame. 2009-10-22 Kaz Kylheku * lib.c (equal): Fix broken LSTR and FUN cases. 2009-10-22 Kaz Kylheku Got "make tests" working in separate build directory, with .out files going to local tests/ tree. * Makefile (depend): Refer to depend.txr and dep.mk using $top_srcdir; no need for symlinks. Changed a few more ./txr references to use $(PROG). (TESTS): Path munging to generate targets with local paths. (%.ok): Fixed diff logic to compare between .expected file in $(top_srcdir) and local .out file. * configure: Don't generate symlinks for tests and dep.mk. 2009-10-22 Kaz Kylheku Got "make install" working. * Makefile (install): New target. * configure (mandir, bindir): New variables. 2009-10-22 Kaz Kylheku Got build to work in separate build directory. * Makefile (CFLAGS): Added -I flag to point header inclusion to the source directory. (PROG): New variable to hold program name. (VPATH): Variable set, as a quick and dirty way to get GNU make to find the prerequisites back in the source directory. * configure: Added steps to symlink the tests directory and dep.mk. * depend.txr: Modified to generate the dependencies with correct references to the top_srcdir, with the exception of locally generated headers. * dep.mk: Regenerated. 2009-10-22 Kaz Kylheku Build configuration via configure script, with cross compiling support. (Tested by cross-compiling txr on an x86 GNU/Linux system to run on a MIPS-based GNU/Linux system). * configure: New script. * Makefile: (OPT_FLAGS, LANG_FLAGS, DIAG_FLAGS, DBG_FLAGS, LEX_DBG_FLAGS, TXR_DBG_OPTS, LEXLIB): Variables removed; these are now generated in config.make by configure. (config.make): New target to print friendlier diagnostic if the build is not configured. (distclean): New target to do clean, plus remove config.make. 2009-10-22 Kaz Kylheku * parser.l (YY_INPUT): Kill tabs with spaces (how did they sneak in?). Fix possible use of uninitialized ch. 2009-10-21 Kaz Kylheku * txr.1: Fixed misleading wording (separation versus termination). Added Introduction headings to some major sections. Improved exception handling intro. 2009-10-21 Kaz Kylheku Version 019 Regexps can be bound to variables. New freeform directive. * txr.c (version): Bump. * txr.1: Bump version and date. * lib.c, match.c, regex.c, regex.h, stream.c: Trailing whitespace removed from lines. 2009-10-21 Kaz Kylheku * txr.1: Documented freeform. 2009-10-21 Kaz Kylheku Change the freeform line catenation semantics to termination rather than separation. * lib.h (lazy_str): Declaration updated. * lib.c (lazy_str): Tack terminator onto initial prefix string. Parameter renamed. Also, terminator string cached in the object. (lazy_str_force, lazy_str_force_upto): Terminate, rather than separate. * match.c (match_files): sep variable renamed to term. 2009-10-21 Kaz Kylheku * gc.c (mark_obj): Bugfix: recurse over recently added member, opts, in the lazy_string structure. 2009-10-20 Kaz Kylheku Got regex working over lazy strings from freeform. Bugfixes. * lib.c (length_str): Fixed recursion to wrong length function. (lazy_str_force): March down list properly. Update lazy string's limit value. * match.c (match_line): Convert to lazy-string-aware style; i.e. avoidance of triggering a force of the whole string. (match_files): Bugfix in argument processing of freeform directive. * regex.h (nfam_result_t): New typedef. (nfa_machine_reset): New function declaration. (nfa_machine_feed): Updated declaration. * regex.c (nfa_machine_init): Refactor to use nfa_machine_reset. (nfa_machine_feed): Return nfam_result_t rather than just int. (search_regex, match_regex): Refactor to work with lazy strings well. 2009-10-20 Kaz Kylheku Implement custom separator and limit in freeform. * lib.h (lazy_string): New struct member, opts. (lazy_str): Declaration updated. * lib.c (lazy_str): New constructor parameters to set the seprator string and numeric line limit. (lazy_str_force, lazy_str_upto): Honor the line limit, and use the separator string if provided. * match.c (match_files): Process the arguments for freeform directive. 2009-10-20 Kaz Kylheku * lib.c (sub_str): Avoid invoking c_str which forces the lazy string. 2009-10-20 Kaz Kylheku Start of implementation for freestyle matching. Lazy strings implemented, incompletely. Changed string function to implicitly strdup; non-strdup version changed to string_own. Fixed wrong uses of strdup rather than chk_strdup. Functions added to regex module to provide regex matching as a state machine to which characters are fed. * lib.h (type_t): New enum member LSTR, for lazy strings. (lstr_t, freestyle, type_check3, string_own): Declared. (string): Parameter changed to const char *. (lazy_stringp, split_str, lazy_str, lazy_str_force_upto, lazy_str_force, lazy_str_get_trailing_list, length_str_gt, length_str_ge, length_str_lt, length_str_le): Declared. * lib.c (lstr_t, freestyle): New symbol globals. (code2type, obj_print, obj_pprint, equal): Extended to handle LSTR. (type_check3): New function. (string_own): New function; does what string used to do. (string): Duplicates the string with strdup, so callers don't have to. (mkstring, copy_str, trim_str): Use string_own. (stringp): A lazy string is a kind of string. (lazy_stringp): New function. (length_str, c_str, search_str, sub_str, chr_str, chr_str_set): Handle lazy strings. (split_str): New function. (lazy_str, lazy_str_force_upto, lazy_str_force, lazy_str_get_trailing_list, length_str_gt, length_str_ge, length_str_lt, length_str_le): New functions. (obj_init): New symbols interned. Eliminated strdup calls. * gc.c (finalize, mark_obj): Changed to handle LSTR type. Eliminated default case from switch so we get a gcc diagnostic if a case is not handled. * match.c (match_files): Eliminated strdup calls. Added freeform directive. * parser.y (grammar): Changed string calls to string_own. * stream.c (stdio_get_line, get_string_from_stream): Changed string calls to string_own. (dir_get_line): Eliminated chk_strdup call. * txr.c (remove_hash_bang_line, main): Eliminated strdup calls. * regex.h (nfam_result): New union. (nfa_machine, nfa_machine_t): New struct and typedef. (nfa_machine_init, nfa_machine_cleanup, nfa_machine_feed, nfa_machine_match_span): New functions declared. * regex.c (nfa_machine_init, nfa_machine_cleanup, nfa_machine_feed, nfa_machine_match_span): New functions defined. 2009-10-18 Kaz Kylheku Trivial change allows regexps to be bound to variables, and used for matching. This Just Works because of the way match_line treats variables. * match.c (eval_form): Check for a regexp form and return it as a value representing itself. * regex.c (regexp): New function. * regex.h (regexp): Declared. 2009-10-17 Kaz Kylheku * deps.mk: Updated. 2009-10-17 Kaz Kylheku Version 018 Bugfixes: mistakes in debugging calls; infinite looping bug in collect; skip directive not advancing match by proper number of lines bug. * match.c (debuglcf): Cosmetic fix. (match_files): After recognizing nothrow in the file spec, replace it by a string. A few places expect first(files) to be a string. The skip directive must return whatever return value it obtained from the nested match_files call, and not substitute the current line number, so that the caller can proceed past the correct number of lines that were matched. Fixed obj_t * being passed to %s printf specifier in debug printf. Collect directive must make progress even if the nested spec makes no progress (returns successfully, but with the original line number). * txr.c (version): Bump. * txr.1: Bump version and date. * txr/tests/004/query-1.txr: New test case. * tests/004/query-1.expected: Expected result for new test case. 2009-10-17 Kaz Kylheku Version 017 Bugfix in exception subtype definition (defex). Tail recursion in marking function of garbage collector. -f option for specifying query file, allowing more options to follow, useful in hash-bang scripting and other situations. * txr.c: (version): Bump to 016 * txr.1: Bump version to 016. 2009-10-17 Kaz Kylheku * txr.1: Documented defex. * unwind.c (uw_register_subtype): Bugfix: if the subtype exists already, we must not delete it and create a new entry, but destructively point its entry to its assigned supertype. An exceptions is thrown rather than abort for attempts to make t a subtype of something other than itself. An attempt to make something other than nil a subtype of nil is diagnosed. Attempts to redefine the relationship between two types if they are already connected by one; this covers circularity and other cases, while still allowing a relaxed order of definition. 2009-10-17 Kaz Kylheku * gc.c (mark_obj_tail): New macro. (mark_obj): Optimized with manual tail recursion. The funtion will no longer generate long call stacks for long lists. Descending to the car field of a cons is still recursive, but ``car-heavy'' trees are rare. 2009-10-16 Kaz Kylheku Resurrect -f option, with different meaning. We need "-f query-file" so that hash-bang scripts can be written which can pass options to txr. * txr.c (help, main): Inplement and document -f. Also bugfix: do not throw file open errors as exceptions of type error, because these cause an abort, potentially leading to a core dump. They are now thrown as file_error. * txr.1: Documented -f. 2009-10-16 Kaz Kylheku Implemented @(next arg) for treating the command line as an input source. * txr.1: Updated, and fixed a few unrelated mistakes. * lib.c (dir): Removed unused symbol globa. (args): New symbol global. * lib.h (dir): Declaration removed. (args): Declared. match.c (match_files): Implemented @(next arg). Had to hack laziness to the file opening logic in match_files. If the function is entered with a spec whose first directive is @(next), then it defers opening the first file in the list of files (since it will be immediately abandoned in favor of another input source). This prevents an error in the situation when the arguments do not name files, and there is a @(next args) directive to process them as an input source. 2009-10-16 Kaz Kylheku Version 016 Catch clauses with parameters. Directive for throwing exceptions: throw. Directive for defining exception types: defex. -f option renamed to -c. * txr.c: (version): Bump to 016 * txr.1: Bump version to 016. 2009-10-16 Kaz Kylheku * txr.c (help, main): Changed -f argument to -c. This is consistent with the -c argument of the shell; -f looks like awk's -f option, which specifies a file, not a literal script body. * txr.1: Updated. 2009-10-15 Kaz Kylheku * txr.1: Grammar, spelling. 2009-10-15 Kaz Kylheku * parser.y (clauses_opt): Long overdue nonterminal added. (define_clause) simplified with clauses_opt. (try_clause): Error handling improved. (catch_clauses_opt): Catch and finally clauses can be empty. Error cases added. * txr.1: Updated. 2009-10-15 Kaz Kylheku * match.c (match_files): Use alist_remove1 for a one element removal. 2009-10-15 Kaz Kylheku * unwind.c (uw_throw): Add program prefix before unhandled exception text. Print it in the standard notation if it's not a string literal. * match.c (sem_error, file_err): Don't stick program prefix into exception text. 2009-10-15 Kaz Kylheku * unwind.c (uw_exception_subtype_p, uw_init): Slight change in representation for exception subtypes, saving one node in the list. 2009-10-15 Kaz Kylheku New throw and defex directives, catches with arguments. * lib.c (defex, throw): New symbol globals. (obj_init): Symbols interned. * lib.h (defex, throw): Declared. * match.c (match_files): Implemented throw and defex. Argument handling in catches. * unwind.c (uw_register_subtype): Returns right argument, so we can cleverly use it with reduce_left. * unwind.h (uw_register_subtype): Declaration updated. * txr.1: Updated. 2009-10-14 Kaz Kylheku Version 015 Code restructuring. Corruption bugfix in gc-debugging code. The nil symbol more properly implemented. Semantics change: collect treated as a failed match if it does not collect anything. Bugfix in function argument reconciliation: must only be done for unbound parameters. New @(local) directive (synonym of forget) for expressing local variables in functions. Quasi-literals: backquote-delimited literals that contain interpolated variables. Useful in next, output, bind and function calls. Hygiene: some implementation-inserted syntax tree elements are now in their own namespace so they can't clash with user-defined constructs. Rewritten streams implementation. Exception handling: try/catch/finally. Exceptions used internally and externally. File errors are mapped to exceptions now. Hash bang (#!) scripting supported. New -f paramater, allowing entire query to be specified as argument rather than from a file or stdin. * txr.c: (version): Bump to 014. * txr.1: Bump version to 014. More documentation about exceptions. 2009-10-14 Kaz Kylheku Support for hash bang execution, and embedding query in a command line argument. * txr.c (remove_hash_bang_line): New function. (main): Added -f option. Initialize and gc-protect yyin_stream, and use it in all places where yyin was previously set up. Diagnose when -a, -D and -f are wrongly clumped with other options. Remove the first line of the query if it starts with #!. * parser.h (yyin): Declaration removed. (yyin_stream): Declared. * parser.l (YY_INPUT): Macro defined. (yyin_stream): New global. * stream.c (string_in_get_line, string_in_get_char): Bugfix: wrong length function used. (string_in_ops): Bugfix: wrong get_char function wired in. (get_char): New function. * stream.h (get_char): Declared. * txr.1: -f option documented. 2009-10-14 Kaz Kylheku * lib.c (obj_print, obj_pprint): Print # syntax if an object has a bad type code; do not just return without printing anything. 2009-10-14 Kaz Kylheku Code cleanup and documentation. * txr.1: Start documenting quasiliterals, exception handling and nothrow in next and output. * parser.y (catch_clauses_opt): Add missing empty production, so that a try block doesn't have to have a finally clause. * lib.h (or2, or3, or4): New macros. * match.c (match_files): Allow output and next forms which just have one argument that is nothrow, as documented. * stream.c common_vformat, string_out_vcformat, string_out_vcformat, make_string_output_stream, make_dir_stream, close_stream, get_line, vformat, vcformat, format, cformat, put_string, put_cstring, put_char): Switch to new style type assertions. 2009-10-13 Kaz Kylheku New syntax for next and output directives, taking advantage of quasi-literals. Non-throwing behavior can be specified in both using nothrow. The old syntax is supported, and has the old semantics (non-throwing). Hence, the test cases pass again without modification. File open errors thrown as file_error type. * lib.c (nothrow, file_error): New symbol globals. (obj_init): New symbols interned. * lib.h (nothrow, file_error): Declared. * match.c (file_err): New function. (eval_form): Bugfix: if input is nil, or an atom other than a symbol, return the value hoisted into a cons. A nil return strictly means, unbound variable. (match_files): Support new syntax for next and and output. Throw open errors as file_err. * parser.l (grammar): Change how OUTPUT is returned to the style similar to DEFINE, so interior forms can be parsed. * parser.y (grammar): Fix up output_clause with new syntax. * unwind.c (uw_throw): Do not abort on unhandled file_error, but terminate with a failed status. (uw_init): Register file_error as a subtype of error exception. 2009-10-13 Kaz Kylheku First cut at working try/catch/finally implementation. * lib.c (try, catch, finally): New symbol globals. (obj_init): New symbols interned. * lib.h (try, catch, finally: Declared. * parser.y (TRY, CATCH, FINALLY): New tokens. (try_clause, catch_clauses_opt): New nonterminal grammar symbols. * parser.l (yybadtoken): TRY, CATCH and FINALLY handled. (grammar): New cases for try, catch and finally. * unwind.h (struct uw_catch): New member called visible. (uw_continue): New parameter added. (uw_exception_subtype_p): Declared. (uw_catch_begin): Macro rewritten to use switch logic around setjmp. (uw_do_unwind, uw_catch, uw_unwind): New macros. (uw_catch_end): Rewritten to close switch, and automatically continue the unwinding if the block is entered as an unwind. * unwind.c (uw_unwind_to_exit_point): Exception catching frames made invisible via new flag prior to control passing to them. longjmp code 2 introduced for distinguishing a catch from an unwind. Visibility flag is checked and invisible frames are skipped. (uw_push_catch): cont member of the unwind frame initialized to zero. (exception_subtype_p): Renamed to uw_exception_subtype_p, changed to extern. Fixed wrong order of arguments to assoc. (uw_throw): Honor visibility flag: do not consider invisible catch frames. (uw_register_subtype): sup/sub mixup bugfix. (uw_continue): Takes extra argument: the continuation frame that (re)establishes the exit point for the unwinding. This allows nested unwinding action to take place in a finally, and then to continue to the original exit point. * match.c (match_files): Handling for try directive added. 2009-10-13 Kaz Kylheku * parser.l (yybadtoken): Bugfix: added missing LITCHAR case. * unwind.h (internal_error): Fixed broken macro. * match.c (match_line, match_files): sem_error bugfix: used %a instead of ~a. (match_files): Wrap block handler in compound statement, otherwise the macroexpansion declares a variable in the middle of a statement, which is a gcc extension to C90 (or a C99 feature, but we aren't using C99). 2009-10-08 Kaz Kylheku Exception handling for query errors. Verbose logging decoupled from yyerror functions. Superior object-oriented formatting used for cleaner code. * lib.c (query_error): New symbol global. (obj_init): New symbol interned. * lib.h (query_error): Declared. * match.c (output_produced): Variable changed to external linkage. (debugf, debuglf, debuglcf, sem_error): New static functions. (dest_bind, match_line, match_files): Regtargetted away from the yyerrorf and yyerrorlf functions to use debugf, debuglf, debuglcf for logging and sem_error for throwing query errors as exceptions. * parser.h (spec_file_str): New global declared. * parser.l (yyerror): Calls yyerrorf instead of yyerrorlf; lets yyerrorf increment error count. (yyerrorf): Loses level argument. (yyerrorlf): Function removed. (yybadtoken): Retargetted from yyerrorlf to yyerrorf. (grammar): yyerrorf call fixed up. * txr.c (spec_file_str): New global defined. (main): Protects new global against gc, and initializes it. * unwind.c (uw_throw): If an unhandled exception is of type query_error, it results in an exit rather than abort. The false string is conditionally printed. (uw_init): Register query_error as subtype of error. 2009-10-08 Kaz Kylheku Exception handling framework implemented. * lib.c (cobj_t, error, type_error, internal_err, numeric_err, range_err): New symbol globals. (prog_string): New string global. (code2type): New static function. (typeof): Rewritten using code2type. (type_check, type_check2): New static functions. (car, cdr, list, plus, minus, length_str, chr_p, chr_str, chr_str_set, apply, funcall, funcall1, funcall2, vec_get_fill, vecref_l, lazy_stream_cons): Checks and assertions rewritten using new functions and macros. (obj_init): prog_string protected from gc. New symbols interned. (init): uw_init() call moved after obj_init() because it needs stable symbols. * lib.h (cobj_t, error, type_error, internal_err, numeric_err, range_err, prog_string, type_check, type_check2): Declared. * match.c (dump_var, complex_snarf, complex_close): abort calls rewritten to use exception handling. * regex.c (nfa_all_states, nfa_closure, nfa_move): Likewise. * stream.c (string_out_vcformat): Bugfix: fill index not updated. (make_string_output_stream): Bugfix: initial buffer not null terminated. (get_string_from_stream): New function. * stream.h (get_string_from_stream): Declared. * txr.c (main): Some error prints turned to throws. * unwind.c (unwind_to_exit_point): Supports UW_CATCH frames, whose finalization logic has to be invoked during unwinding, and as target exit points. (uw_init): Installs exception symbols into subtyping hirearchy. (uw_push_catch, exception_subtype_p, uw_throw, uw_throwf, uw_errorf, uw_throwcf, uw_errorcf, type_mismatch, uw_register_subtype, uw_continue): New functions. (exception_subtypes): New static global. * unwind.h (noreturn): New macro, conditionally defined on __GNUC__. (enum uw_frtype): New member, UW_CATCH. (struct uw_catch): New struct type. (union uw_frame): New member, ca. (uw_push_catch, exception_subtype_p, uw_throw, uw_throwf, uw_errorf, uw_throwcf, uw_errorcf, type_mismatch, uw_register_subtype, uw_continue): New functions declared. (uw_catch_begin, uw_catch_end, internal_error, type_assert, bug_unless, numeric_assert, range_bug_unless): New macros. 2009-10-07 Kaz Kylheku Rewritten streams implementation. * stream.h, stream.c: New files. * Makefile (OBJS): New object file stream.o. * dep.mk: Dependencies updated. * gc.c (finalize): STREAM case removed. Call destroy only if not null. (mark_obj): STREAM case removed. * lib.c (push, pop): New functions. (equal): STREAM case removed. (sub_str): Allow from parameter to be nil, defaulting to zero. (stdio_line_read, stdio_line_write, stdio_close, stdio_line_stream, pipe_close, pipe_line_stream, dirent_read, dirent_close, dirent_stream, stream_get, stream_pushback, stream_put, stream_close): Functions removed. (stream_ops dirent_stream_ops, stdio_line_stream_ops, struct stream_ops, pipe_line_stream_op): Static structs removed. (lazy_stream_func, lazy_stream_cons): Retargetted to new streams. (cobj_print_op): Likewise. (init): Disables and restores GC, instead of doing it in obj_init. (obj_print): Retargetted to new streams. (obj_pprint): New function. (obj_init): Does not manipulate gc_state any more, moved to init. Call to stream_init added. (d, snarf): Retargetted to new streams. (snarf_line): Removed, now appears in stream.c, retargetted to new streams. * lib.h (enum type): STREAM removed. (struct stream, struct stream_ops): Removed. (struct cobj_ops): Retargetted to new streams. (union obj): sm member removed. (push, pop, obj_pprint): Declared. (stdio_line_stream, pipe_line_stream, dirent_stream, stream_get, stream_pushback, stream_put, stream_close, snarf_line): Removed. (cobj_print_op, dump, snarf): Modified. * match.c (dump_bindings, complex_snarf): Retargetted to new streams. * txr.c (main): format used to dump bindings and specs in verbose mode. 2009-10-07 Kaz Kylheku Implemented quasi-literals: string literals which may contain variables to be interpolated. Also, took care of a hygiene problem with respect to some parser-generated forms, which must be invisible to the user. * Makefile (LEX_DB_FLAGS): New variable; helpful in generating a lexical analyzer with debug tracing. * parser.l (nesting, closechar): Static variables removed. (char_esc): Add \` escape for quasi-literals. (stack): New %option, to generate a scanner which has a start condition stack. (QSILIT): New start condition. (grammar): Refactored to use start condition stacks. Quasi-literal lexical analysis added. * parser.y (lit_char_helper): New function, for factoring out some common logic between string literals and quasi literals. (quasilit, quasi_item, quasi_items): New grammar symbols and production rules. (strlit): Rule shortened with new helper function. Bugfix: error case assigns nil to $$. (chrlist): Bugfix: error case assigns nil to $$. (LITCHAR): Added to %prec table to fix shift-reduce problem. (expr): Production now can generate a quasilit. * lib.c (quasi): New symbol global. (obj_init): Intern quasi as "$quasi", so the user can make a function called quasi. Also, var and regex are now interned with the names "$var" and "$regex" for the same reason. * lib.h (quasi): Declared. * match.c (eval_form): Rewritten with recursive processing to handle deeply embedded variables, as well as quasi-strings. (subst_vars): Handles quasi-strings. (match_files): Function calls now use eval_form for function argument evaluation, except of course in the special case that if an argument is a symbol, it may be unbound. 2009-10-06 Kaz Kylheku * match.c (match_files): No error message for merging to a symbol which is already bound; the existing behavior is to destructively update the binding, which is useful, and so the error is pointless. 2009-10-06 Kaz Kylheku Introduce local as synonym to forget. It does exactly the same thing; a previous binding is forgotten. This spelling is nicer for functions. * lib.h (local): Declared. * lib.c (local): Defined. (obj_init): New symbol interned. 2009-10-06 Kaz Kylheku Bugfix: function parameter reconciliation (after function call completes) must only consider the unbound parameters. Otherwise false mismatches result if the function destructively manipulated some bindings of bound parameters. E.g. @(define foo (a)) is called as @(foo "bar") and internally it rebinds bound parameter a to "baz". This situation is not a mismatch. The rebinding is thrown away. * match.c (match_files): When processing a function call, keep an alist which associates arguments and unbound parameters. Then, after the function call, process the alist, rather than the full parameter list. 2009-10-06 Kaz Kylheku Semantics change: collect fails if it does not collect anything. Non-failing behavior can be obtained by wrapping with @(maybe) (but no such workaround for coll yet). * match.c (match_line): Return nil if coll collected nothing. (match_files): Return nil if collect collected nothing. 2009-10-06 Kaz Kylheku Bugfix: nil must be on the list of interned symbols. * lib.c (sym_name): Function removed. This was like symbol_name but did not accept nil. (intern): Use symbol_name instead of sym_name, allowing nil to be on the list of interned symbols. (obj_init): Add nil to interned_syms list. (nil_string): Changed from "NIL" to "nil". * match.c (dest_bind): Treat nil as a value, not a symbol. (match_files): Treat nil as a value when it's a function argument. 2009-10-06 Kaz Kylheku * gc.c (more): Bugfix: free_tail was incorectly calculated, thereby destroying the validity of the FIFO recycling algorithm used when GC debugging is enabled. This showed up as mysterious assertions and crashes. (mark_obj): Do not abort if a free object is marked. (mark_mem_region): Renamed bottom and top variables to low and high. The naming was confusing inverted relative to that in the caller. (sweep): Abort if somehow a block is free and marked reachable. 2009-10-06 Kaz Kylheku * match.c (match_files): Fixed nonexitent symbol warning for merge directive (complained about wrong symbol). 2009-10-05 Kaz Kylheku Refactoring matching code. * lib.h (cobj_ops): New function pointer, mark. * gc.c (mark_obj): For a COBJ type, call the mark function if the pointer is non-null. (gc_mark): New public function, wrapper that calls the private mark_obj. Implementations of mark for COBJ objects will need to call this. * gc.h (mark_obj): Declared. * regex.c (regex_obj_ops): Explicitly initialize mark function pointer to null. 2009-10-05 Kaz Kylheku Code restructuring. * Makefile (match.o): New object file. (depend): New rule for generating dep.mk, using txr. (lib.o, lex.yy.o, regex.o, y.tab.o unwind.o, txr.o, match.o, gc.o): Dependency rules removed. * dep.mk: New make include file; captures dependencies. Generated by new depend rule in Makefile, using txr. * depend.txr: Txr query to generate dependencies. * extract.y: File renamed to parser.y (output_produced): Variable removed, moved into new file match.c. (dump_shell_string, dump_shell_string, dump_var, dump_bindings, depth, weird_merge, map_leaf_lists, dest_bind, eval_form, match_line, format_field, subs_vars, complex_open, complex_open_failed, complex_close, complex_snarf, robust_length, bind_car, bind_cdr, extract_vars, extract_bindings, do_output_line, do_output, match_files, extract): Functions removed, added to match.c. (struct fpip): Definition removed, added to match.c (, , , , , "gc.h", "unwind.h"): Unneeded headers removed. * match.c: New file. * extract.l: Renamed to parser.l. * extract.h: Renamed to parser.h. (opt_loglevel, opt_nobindings, opt_arraydims, version, progname): Declarations moved to txr.h. (extract): Dclaration moved to match.h. * txr.h, match.h: New headers. * gc.h (opt_gc_debug): Moved to txr.h. 2009-10-03 Kaz Kylheku Version 014 New cases directive. New define directive: user-defined dynamically scoped functions. String literals in bind and function calls. EOF in the middle of a line handled properly. * extract.l (version): Bump to 014. * txr.1: Bump version to 014. 2009-10-02 Kaz Kylheku New cases directive. * extract.l (yybadtoken): Add case for CASES. (grammar): Tokenize cases directive. * extract.y (CASES): New token kind. (cases_clause): New grammar symbol. (grammar): Implement new grammar cases. (match_files): Implement semantics for cases. * lib.c (cases): New global. (obj_init): Intern cases symbol. * lib.h (cases): Declared. * txr.1: Documented. 2009-10-02 Kaz Kylheku Support for string and character literals. * extract.l (char_esc): Support \' and \" escapes. (STRLIT, CHRLIT): New flex start conditions. (grammar): New rules for tokenizing string literals. * extract.y (LITCHAR): New token kind. (strlit, chrlit, litchars): New grammar symbols. (grammar): Implement string literal parsing. (dump_var): Support character objects, treating them as one-character strings. (eval_form): New function. (match_files): In bind directive, allow the right hand side to be an arbitrary object. * lib.c (mkustring, init_str): New functions. (cat_str): Allow characters in the mix, treating them as one-character strings. * lib.h (mkustring, init_str): Declared. (chrp, chr_str, chr_str_set): New function. * txr.1: Documented. 2009-10-02 Kaz Kylheku Support for query-defined functions. * extract.l (yybadtoken): New DEFINE case. (NESTED): New flex start condition. This allows for different lexing rules in nested lists, so even though for instance @(collect) is a special token @((collect)) isn't. (grammar): Refactored with NESTED. Tokenize define directive. * extract.y (define_transform): New function. (DEFINE): New token kind. (define_clause): New grammar symbol. (match_files): Implement define semantics, and function calls. * lib.c (define): New global. * lib.h (define): Declared. (proper_listp, alist_remove1, copy_cons, copy_alist): New functions. (obj_init): Intern define symbol. (init): Call new function uw_init. * unwind.c (toplevel_env): New static structure. (uw_unwind_to_exit_point): Support new UW_ENV frame type. (uw_init, uw_find_env, uw_push_env, uw_get_func, uw_set_func): New functions. * unwind.h (UW_ENV): New enumeration member in uw_frtype. (uw_dynamic_env): New struct. (uw_block_begin, uw_block_end): Renamed some variables. (uw_env_begin, uw_env_end): New macros. * txr.1: Documented. 2009-10-02 Kaz Kylheku Misc. bugfixes and improvements. * extract.l (grammar): Newline in a directive no longer an error. Why not allow it. * extract.y (grammar): Productions for catching empty bodies in some constructs now end with END newl, rather than just END, so parsing can continue sanely. (match_lines): In diagnostics, don't say "ignored" about material which causes an error that fails the query! * lib.c (mkstring): Initialize length since we know it! (c_str): Take a symbol as an arg, so we don't have to keep writing c_str(symbol_name(sym)). (obj_print): Use isprint rather than isctrl to decide whether to print a character as an escape. (snarf_line): Properly handle EOF in the middle of line. 2009-09-29 Kaz Kylheku Version 013 Some minor garbage collection issues fixed. Infinite looping bug fixed. New @(trailer) directive. * extract.y (match_files): Implemented trailer directive. * extract.l (version): Bump to 013. * lib.h (trailer): Declaration added. * lib.c (trailer): External definition added. (obj_init): Initializer trailer with interned symbol. * txr.1: Documented @(trailer) and bumped version to 013. 2009-09-29 Kaz Kylheku Looping bug fixed. Certain directives could cause an infinite loop if the query has run out of data. * extract.y (match_files): The semantics of the first_file_parsed argument changes a little bit. Previously, if nil was passed, a new lazy stream would be opened for the first file. But this is ambiguous because nil also means empty list; sometimes when we recurse into match_files, the data has ran out and this argument is thus nil. Now, that argument must be the symbol t in order to mean ``open the first file''. If the argument is nil, it unambiously means ``we are at the end of the current file; don't open anything''. (extract): The initial call to match_files now passes the symbol t for the first_file_parsed argument. 2009-09-29 Kaz Kylheku Fixing some gc issues. The test cases were found to bomb with an assertion when run with --gc-debug enabled, due to a garbage-collected object still being used. This was due to the way the main function was structured. Also, the stack ``top'' terminology in the gc was stupidly wrong. Leaf function frames are at the stack top, and main is near the bottom. I was thinking of the ``top caller''. * Makefile (TXR_DBG_OPTS): New variable. Tests are now run with --gc-debug, which makes them slower, but has much greater chance of trapping gc problems. * extract.l (main): Two variables are now used for determining the stack bottom. We don't know in which order the compiler places local variables into a stack frame. (This is a separate question from that of the direction of stack growth). The call to the init function is now done right away. The argument processing section of main does some processing with GC objects, but the init function was being called afterward, before the list of interned symbols is protected from garbage collection! So with --gc-debug turned on, parts of the interned symbol list were being garbage collected (since the variable has not yet been added to the set of root pointers, which is done in the init function). Also, the use of an unknown --long-option is diagnosed properly now. * gc.c (gc_stack_top): Renamed to gc_stack_bottom, and converted from extern to static. (mark): Follows rename of gc_stack_top to gc_stack_bottom. (sweep): Eliminated the freed variable for counting freed objects, and the associated debug message, which was not useful. Commented why the free list is managed differently when dbg is turned on. (gc_init): New function. * gc.h (gc_stack_top): Declaration removed. (gc_init): Declaration added. * lib.c (min): New macro. (init): Takes two additional arguments which are used to determine the stack bottom. The function first determiens whether the stack grows up or down. Then it takes the greater or smaller of the two potential stack top pointers, based on that. The result is passed go gc_init. * lib.h (init): Declaration updated. 2009-09-28 Kaz Kylheku Version 012 Semantics change of @(until) in @(collect) and @(coll). Minor fixes. * extract.y (match_line, match_files): The until clauses continue to be processed after the main clauses of the collect or coll (to see the bindings), but are processed before the collection occurs, so that the until will veto the bindings of the last iteration. Moreover, the data positions stays where it is when this happens, and no arrangement is made to match the until material again. * txr.1: Tried to document the change. 2009-09-27 Kaz Kylheku * txr.1: following proofread, fixed various escaping problems and instances of missing text. 2009-09-26 Kaz Kylheku * lib.c (equal): Bugfixes: wrong fallthrough of FUN case. VEC case must return nil, not break. 2009-09-26 Kaz Kylheku Preparation for some sorting support. * extract.y (merge): Renamed to weird_merge. (map_leaf_lists): New functino. (match_file): Follow weird_merge rename. * lib.c (all_satisfy, none_satisfy, string_lt, do_bind2other, bind2other, merge, do_sort, sort): New functions. * lib.h (all_satsify, none_satisfy, string_lt, bind2other, sort): Declared. 2009-09-25 Kaz Kylheku Version 011 New @(maybe) clause optionally matches (does not fail if none of its clauses match anything). New blocks feature: allows a query or subquery to be abruptly terminated by invoking an exit to a named or anonymous block. @(collect) and @(skip) have implicit anonymous blocks now. The @(skip) directive takes a numeric argument now, which limits how many lines are searched. * Makefile, extract.l, extract.y, extract.h, gc.c, gc.h, lib.c, lib.h, regex.c, regex.h, txr.1, unwind.c, unwind.h: Copyright notice and license text updated or added, and version bumped up to 011. * tests/001/query-1.txr, tests/001/query-2.txr, tests/001/query-3.txr, tests/002/query-1.txr: Assigned to public domain. 2009-09-25 Kaz Kylheku New features: - named blocks; - maybe clause; - optional iteration bound on skip. * extract.y: includes added: "unwind.h", . (MAYBE, OR): New grammar tokens. (maybe_clause): New nonterminal grammar symbol. (expr): A NUMBER can be an expression now, so that @(skip 42) is valid syntax. (match_files): Support for numeric argument in skip directive to bound the search to a maximum number of lines. Anonymous block established around skip. New directives implemented: maybe, block, accept and fail. Anonymous block established around collect. * txr.1: Documentation updated with new features. * Makefile: new object file unwind.o, and associated rules. * extract.l (yybadtoken): New cases for MAYBE and OR. (grammar): Likewise. * lib.c (block, fail, accept): New symbol variables. (obj_init): New symbols interned. * lib.h (block, fail, accept): Declared. (if2, if3): Macros fixed so test expression is not compared to nil, but implicitly tested as boolean. * unwind.c, unwind.h: New source files. 2009-09-24 Kaz Kylheku Stability fixes. * extract.y (match_files): Fixed invalid string("-") to string(chk_strdup("-")) which caused a freeing of a non-malloced string at gc finalization time. * regex.c (nfa_state_shallow_free): New function: does not free satellite objects, just the structure itself. (nfa_combine): Use nfa_state_shallow_free instead of nfa_state_free, because the merged state inherits ownership of objects from the state being spliced out. (nfa_state_set): Fix lack of initialization of s.visited member of the state structure. 2009-09-24 Kaz Kylheku Version 010 A file specs can start with $, which means read a directory. Data sources are not into memory at once, but on demand, which can reduce memory for many queries. Regular expressions are now compiled once, when the query is parsed. Character escapes are now supported in regular expressions, and as a special syntax. * extract.l (version): Bumped to 010. (grammar): 8 and 9 are not octal digits; handle all regex backslash escaping in lexical grammar. * extract.y (grammar): Get rid of backslash handling from regex grammar. Lexer returns a REGCHAR for every escaped item. In situations where an operator character is implicily literal, like * in a character class, we use the grammar to include that alongside REGCHAR. Bugfixes: the character ], when not closing a class, is not a syntax error but stands for itself; the character - stands for itself outside of character class; the | character is literal in a character class. * txr.1: Updated version. Documented character escapes. 2009-09-24 Kaz Kylheku Lazy stream list improvement: no extra NIL element caused by end-of-file. Requires push-back support in streams. To avoid introducing a new structure member into streams, we extend the semantics of the label member, and rename it to label_pushback. * lib.c (stdio_line_stream, pipe_line_stream, dirent_stream): Follow rename of struct stream member; assert that label is an atom. (stream_get): Check pushback stack first and get item from there. (stream_pushback): New function. (lazy_stream_func): Pull one more item from the stream and use /that/ to decide whether to continue the lazy stream. The extra item is pushed back, if valid. (lazy_stream_cons): Simplified: no hack involving regular cons. Starts the induction by peeking into the stream. If something is there, it is pushed back, and a lazy cons is constructed which will fetch it. (obj_print): Made aware of the pushback, which must be skipped to get to the terminating label. * lib.h (struct stream): Member renamed from label to label_pushback. (stream_pushback): New function declaration. 2009-09-23 Kaz Kylheku Escape syntax in regexes, and text. The standard seven character escapes are supported, namely \a, \b, \t, \n, \v, \f, and \r, as well as hex and octal escapes, plus the code \e for ASCII ESC. * extract.l (char_esc, num_esc): New functions. (grammar): New lex cases. * lib.c (obj_print): Support all character escapes in printing. Bugfix: backslash printed as two backslashes, not one. 2009-09-23 Kaz Kylheku * tests/002/query-1.txr: Modified to use $ to scan thread subdirectories. * tests/002/query-1.expected: Updated. 2009-09-23 Kaz Kylheku New COBJ type for wrapping arbitrary C objects into the Lisp-like framework. Compiled regexes are objects now. Regexes in a query are now compiled just once. * extract.y (grammar): Regexes compiled while parsing. (match_line): Modify with respect to the abstract syntax tree change, and the interface changes in the match_regex, and search_regex functions. * gc.c (mark_obj, finalize): Handle marking and finalization of COBJ objects. * lib.c (typeof, equal, obj_print): Handle COBJ. (cobj, cobj_print_op): New functions. * lib.h (type_t): New enum element, COBJ. (struct cobj, struct subj_ops): New types. (union obj): New member, co. (cobj, cobj_print_op): New functions declared. * regex.c (regex_equal, regex_destroy, regex_compile, regex_nfa): New functions. (regex_obj_ops): New static struct. (search_regex, match_regex): Interface change. Regex arguments are now compiled regexes. Functions won't handle raw regexes. * regex.h (regex_compile, regex_nfa): New functions declared. 2009-09-23 Kaz Kylheku New feature: file specs that start with $ read directories. Reading from an ``ls'' pipe is too slow. Streams and lazy conses implemented. Lazy conses allow us to treat a file or other kind of stream exactly as if it were a list. We can use car and cdr, etc. But only the parts of the list that we actually touch are instantiated on-the-fly by reading from the underlying stream. * extract.l: inclusion of added. * extract.l: inclusion of added. * extract.y (fpip_closedir): new enumeration in struct fpip, and fpip_noclose removed. (complex_open): Check for leading $, use opendir. (complex_open_failed): New function. (complex_close): Handle fpip_closedir case. Not closing stdin and stdout is handled by explicit comparison now. (complex_snarf): New function, constructs stream of a suitable type, over object returned from complex_close, wraps it in a lazy list. (match_files): Use complex_snarf instead of snarf to get a lazy list. * gc.c: Handle LCONS and STREAM cases. * lib.c (stream_t, lcons_t): New variables holding symbols. (typeof, equal, obj_print): Handle LCONS and STREAM. (car, cdr, car_l, cdr_l, consp, atom, listp): Rewritten to handle LCONS. (chk_strdup, stdio_line_read, stdio_line_write, stdio_close stdio_line_stream, pipe_close, pipe_line_stream, dirent_read, dirent_close, dirent_stream, stream_get, stream_put, stream_close, make_lazycons, lazy_stream_func, lazy_stream_cons): New functions. (stdio_line_stream_ops, pipe_line_stream_ops, dirent_stream_ops): New static structs. (obj_init): Intern new symbols lstream, lcons, and dir. * lib.h (type_t): New enum members STREAM and LCONS. (struct stream, struct stream_ops, struct lazy_cons): New types. (union obj): New members sm and lc. (chk_strdup, stdio_line_stream, pipe_line_stream, dirent_stream, stream_get, stream_put, stream_close, lazy_stream_cons): New function declarations. * regex.c: inclusion of added 2009-09-23 Kaz Kylheku Version 009 User-friendly error messages from parser. Fixed -q option. * extract.l (version): Bumped to 009. * txr.1: Updated version. 2009-09-22 Kaz Kylheku * Makefile (LIBLEX): New variable. Refer to lex library as -lfl, using variable that can be overridden. 2009-09-22 Kaz Kylheku * extract.h (yybadtoken): New function declaration. * extract.l (yybadtoken): New function. (main): Fixed -q option. * extract.y (grammar): Lots of new error productions, some phrase rules refactored, resulting in much more user-friendly error diagnosis. * txr.1: -q option semantics clarified.