summaryrefslogtreecommitdiffstats
path: root/regex.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Optimization for one-character range.Kaz Kylheku2015-09-271-2/+7
| | | | | * regex.c (reg_optimize): [a] -> a. Also take advantage of this where the complement case generates [a].
* Optimize complement operator more.Kaz Kylheku2015-09-271-0/+28
| | | | | * regex.c (reg_optimize): Recognize and transform several cases: ~c -> ([^c]?|..+); ~[^c] -> ([c]?|..+); and ~.*c.* -> [^c]*.
* S-exp level regex optimization.Kaz Kylheku2015-09-271-32/+156
| | | | | | | | | | | | | | * regex.c (dv_compile_regex): Replaced by two functions, reg_expand_nongreedy and reg_compile_csets. (reg_expand_nongreedy, reg_compile_csets): New static functions. (reg_optimize): New static function. (regex_compile): Expand nongreedy syntax in incoming regex, and then optimize it before deciding whether to use NFA or derivatives. If derivatives are used, compile the character sets in the regex to character set objects. (regex_init): Register some intrinsic functions for debugging, sys:reg-expand-nongreedy and sys:reg-optimize.
* Support t regex in NFA compiler and in printer.Kaz Kylheku2015-09-271-1/+16
| | | | | | | | | | | | | | | | The t regex means "match nothing". This patch allows the NFA compiler to handle it. This will be necessary for an upcoming regex optimizer which can put out such an object. Also, the recursive regex printer can print the object now. * regex.c (nfa_kind_t): New enum member, nfa_reject. (nfa_state_reject): New static function. (nfa_compile_regex): Compile t regex into a reject state which cannot reach its corresponding acceptance state. (nfa_map_states): Handle nfa_reject case in switch, similarly to nfa_accept: nothing to transition into. (print_rec): Render the t regex as the empty character class [].
* Replace internal_error with exception throws in regex.Kaz Kylheku2015-09-271-7/+7
| | | | | | * regex.c (nfa_compile_regex, dv_compile_regex, reg_nullable, reg_matches_all, reg_derivative, regex_requires_dv): Throw an exception for the bad operator case.
* Bug in complement case of reg_matches_all.Kaz Kylheku2015-09-271-1/+2
| | | | | | * regex.c (reg_matches_all): A complement matches all if its argument matches nothing, not if its argument is anything but the empty match nil.
* regex: major optimization for complement operator.Kaz Kylheku2015-09-241-1/+46
| | | | | | | | | | | This change a huge improvement for expressions that use complement, directly or via the non-greedy % operator. * regex.c (reg_matches_all): New static function. (reg_derivative): When the dervative is applied to a complement expression, identify situations when the remaining expression cannot possibly match anything, and convert them to the t expression.
* Regex state-marking counter wraparound bug.Kaz Kylheku2015-09-151-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a NFA regex goes through more than 4.29 billion state transitions, the state coloring "visited" marker wraps around. There could still exist states with old values at or near zero, which destroys the correctness of the closure calculations. * regex.c (nfa_handle_wraparound): New static function. The wraparound situation is handled by detecting when the next marker value is UINT_MAX. When this happens, we visit all states, marking them to UINT_MAX. Then we visit them again, marking them to zero, and set the next marker value to 1. (nfa_free): Added comment about why we don't have a wraparound check, in case it isn't obvious. (nfa_run): Check for wraparound before eveyr nfa_closure call. (regex_machine_reset): Check for wraparound before nfa_closure call. Fix: store the counter back in the start state's visited field. (regex_machine_init): Initialize the n.visited field of the regex machine structure to zero. Not strictly necessary, since it's initialized moments later in regex_machine_reset, but good form. (regex_machine_feed): Check for wraparound before nfa_closure call.
* Use alloca for some temporary arrays in regex module.Kaz Kylheku2015-09-151-11/+5
| | | | | * regex.c (nfa_free): Use alloca for array of all states. (nfa_run): Use alloca for move, closure and stack arrays.
* Remove limit on NFA state size and allocate tightly.Kaz Kylheku2015-09-151-62/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * regex.c (struct regex): New member, nstates. (NFA_SET_SIZE): Preprocessor symbol removed. (struct nfa_machine): New member, nstates. (nfa_all_states): Function removed. (nfa_map_states): New static function. (nfa_count_one, nfa_count_states, nfa_collect_one): New static functions. (nfa_free): Takes nstates argument. Calculate array of all states using nfa_map_states over nfa_collect_one rather than nfa_all_states. The array is tightly allocated. Also the spanning tree traversal needs just one root, nfa.start. It's not clear why nfa_all_states used nfa.start and nfa.accept as roots. (nfa_closure): Takes nstates parameter; array bounds checking performed tightly against nstates rather than NFA_SET_SIZE. (nfa_move): Check against NFA_SET_SIZE removed. (nfa_run): Take nstates argument. Allocate arrays tightly. Pass nstates to nfa_closure. (regex_destroy): Pass regex->nstates to nfa_free. (regex_compile): Initialize regex->nstates. (regex_run): Pass regex->nstates to nfa_run. (regex_machine_reset): Pass nstates to nfa_closure. (regex_machine_init): Initialize n.nstates member of regex machine. Allocate arrays tightly. (regex_machine_feed): Pass nstates to nfa_closure.
* Fix memory leak in regexes.Kaz Kylheku2015-09-141-1/+1
| | | | | | * regex.c (nfa_free): The visited marker must be incremented, otherwise nfa_all_states will only collect start and accept.
* Don't use prot1 for temporary gc protection.Kaz Kylheku2015-09-071-3/+1
| | | | | | | | | | | | * lib.c (split_str, split_str_set, list_str, int_str): Use gc_hint rather than prot1/rel1. More efficient, doesn't use space in the prot_stack array. * regex.c (search_regex): Likewise. * stream.c (vformat_str, formatv, run): Likewise. In formatv, rel1 wasn't being called in the uw_unwind block, so this fixes a bug.
* Count East Asian Wide and Full Fidth chars as two columns.Kaz Kylheku2015-08-101-0/+66
| | | | | | | | | | | | | | | | | * regex.c (create_wide_cs): New static function. (wide_display_char_p): New function. * regex.h (wide_display_char_p): Declared. * stream.c (put_string, put_char): Use wide_display_char_p to determine whether an extra column need be counted. Also bugfix: iswprint evidently cannot be relied to work over the entire Unicode range, at least not in the C locale. Glibc's version and is reporting valid Japanese characters as unprintable on Ubuntu. As a hack we instead check for control characters and invert the result: control chars are unprintable. * tests/009/json.expected: Updated.
* Pass pretty flag to cobj print operation.Kaz Kylheku2015-08-011-2/+3
| | | | | | | | | | | | | | | | | | | | | * hash.c (hash_print_op): Take third argument, and call cobj_print_impl rather than cobj_print. * lib.c (cobj_print_op): Take third argument. The object class is * printed with obj_print_impl. (obj_print_impl): Static function becomes extern. Passes its pretty flag argument to cobj print virtual function. * lib.h (cobj_ops): print takes third argument. (cobj_print_op): Declaration updated. (obj_print_impl): Declared. * regex.c (regex_print): Takes third argument, and ignores it. * stream.c (stream_print_op, stdio_stream_print, cat_stream_print): Take third argument, and ignore it. * stream.h (stream_print_op): Declaration updated.
* Correction to COBJ initialization pattern.Kaz Kylheku2015-07-301-2/+2
| | | | | | | | | | | | | In fact, the previosuly documented process is not correct and still leaves a corruption problem under generational GC (which has been the default for some time). * HACKING: Document flaw in the initialization pattern previously thought to be correct, and show fix. * hash.c (copy_hash): Fix instance of incorrect pattern. * regex.c (regex_compile): Likewise.
* Bugfix: throwing error when trying to print valid regexps.Kaz Kylheku2015-04-191-1/+1
| | | | | | * regex.c (print_rec): Only dianose "bad object in regex syntax" for some atom other than nil, which denotes an empty (sub)expression, like what results from #// or #/a|/ and such.
* * regex.c (match_regex_right): Bugfix: zero length matchesKaz Kylheku2015-02-201-1/+1
| | | | | should return zero length, rather than nil. This is achieved by trying the match at one past the last character.
* String-returning wrappers for some regex matching functions.Kaz Kylheku2015-02-201-0/+21
| | | | | | | | | | | * eval.c (eval_init): Register search-regst, match-regst and match-regst-right intrinsics. * regex.c (search_regst, match_regst, match_regst_right): New functions. * regex.h (search_regst, match_regst, match_regst_right): Declared. * txr.1: Documented new variants.
* * regex.c (print_rec): A compound must use parentheses forKaz Kylheku2015-02-151-2/+8
| | | | elements which have a higher precedence than catenation.
* Update copyright notices from 2014 to 2015.Kaz Kylheku2015-02-011-1/+1
| | | | | | | | | | | * arith.c, arith.h, combi.c, combi.h, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, gc.c, gc.h, hash.c, hash.h, lib.c, lib.h, match.c, match.h, parser.h, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, stream.c, stream.h, sysif.c, sysif.h, syslog.c, syslog.h, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Update. * LICENSE, METALICENSE: Likewise.
* Use macro to initialize cobj_ops.Kaz Kylheku2015-01-291-14/+10
| | | | | | | | | | * lib.h (cobj_ops_init): New macro. * hash.c (hash_ops, hash_iter_ops): Initialize with cobj_ops_init. * rand.c (random_state_ops): Likewise. * regex.c (char_set_obj_ops, regex_obj_ops): Likewise.
* * Makefile: Removing trailing spaces.Kaz Kylheku2014-10-241-16/+16
| | | | | | | | | | (GREP_CHECK): New macro. (enforce): Rewritten using GREP_CHECK, with new checks. * arith.c, combi.c, debug.c, eval.c, filter.c, gc.c, hash.c, lib.c, * lib.h, match.c, parser.l, parser.y, rand.c, regex.c, signal.c, * signal.h, stream.c, syslog.c, txr.c, unwind.c, utf8.c: Remove trailing spaces.
* Converting cast expressions to macros that are retargettedKaz Kylheku2014-10-171-67/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to C++ style casts when compiling as C++. * lib.h (strip_qual, convert, coerce): New casting macros. (TAG_MASK, tag, type, wli_noex, auto_str, static_str, litptr, num_fast, chr, lit_noex, nil, nao): Use cast macros. * arith.c (mul, isqrt_fixnum, bit): Use cast macros. * configure (INT_PTR_MAX): Define using cast macro. * debug.c (debug_init): Use cast macro. * eval.c (do_eval, expand_macro, reg_op, reg_mac, eval_init): Use cast macros. * filter.c (filter_init): Use cast macro. * gc.c (more, mark_obj, in_heap, mark, sweep_one, unmark): Use cast macros. * hash.c (hash_double, equal_hash, eql_hash, hash_equal_op, hash_hash_op, hash_print_op, hash_mark, make_hash, make_similar_hash, copy_hash, gethash_c, gethash, gethash_f, gethash_n, remhash, hash_count, get_hash_userdata, set_hash_userdata, hash_iter_destroy, hash_iter_mark, hash_begin, hash_uni, hash_diff, hash_isec): Use cast macros. * lib.c (code2type, chk_malloc, chk_malloc_gc_more, chk_calloc, chk_realloc, chk_strdup, num, c_num, string, mkstring, mkustring, upcase_str, downcase_str, string_extend, sub_str, cat_str, trim_str, c_chr, vector, vec_set_length, copy_vec, sub_vec, cat_vec, cobj_print_op, obj_init): Likewise. * match.c (do_match_line, hv_trampoline, match_files, dir_tables_init): Likewise. * parser.l (grammar): Likewise. * parser.y (parse): Likewise. * rand.c (make_state, make_random_state, random_fixnum, random): Likewise. * regex.c (CHAR_SET_L2_LO, CHAR_SET_L2_HI, CHAR_SET_L1_LO, CHAR_SET_L1_HI, CHAR_SET_L0_LO, CHAR_SET_L0_HI, L0_full, L0_fill_range, L1_full, L1_fill_range, L1_contains, L1_free, L2_full, L2_fill_range, L2_contains, L2_free, L3_fill_range, L3_contains, L3_free, char_set_create, char_set_cobj_destroy, nfa_state_accept, nfa_state_empty, nfa_state_single, nfa_state_wild, nfa_state_set,
* C++ upkeep.Kaz Kylheku2014-10-141-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TXR's support for compiling as C++ pays off: C++ compiler finds serious bugs introduced in August 2 ("Big switch to reentrant lexing and parsing"). The yyerror function was being misused; some of the calls reversed the scanner and parser arguments. Since one of the two parameters is void *, this reversal wasn't caught. * parser.l (yyerror): Fix first two arguments being reversed. (num_esc): Change previously correct call to yyerror to follow reversed arguments, so that it stays correct. * parser.y (%parse-param): Change order of these directives so that the scnr parameter is before the parser parameter. This causes the yacc-generated calls to yyerror to have the arguments in the correct order. It also has the effect of changing the signature of yyparse, reversing its parameters. (parse): Update call to yyparse to new argument order. * parser.h (yyparse): Declaration removed. (yyerror): Declaration updated. * regex.c (regex_kind_t): New enum typedef. (struct regex): Use regex_kind_t rather than an enum inside the struct, which has different scoping rules under C++. * txr.c (get_self_path): Fix signed/unsigned warning.
* Version 99.txr-99Kaz Kylheku2014-10-051-0/+1
| | | | | | | | | | | | * RELNOTES: Updated. * configure, txr.1: Bumped version. * share/txr/stdlib/ver.txr: Likewise * Makefile: Improve binary packaging rules. * regex.c: #include <stdarg.h> added.
* Printing of regular expression objects implemented.Kaz Kylheku2014-10-041-1/+151
| | | | | | | | * regex.c (regex_print): New static function. (regex_obj_ops): Registered regex_print. (print_class_char, paren_print_rec, print_rec): New static functions. * dep.mk: Regenerated.
* Keep regex source code in regex objects, in anticipationKaz Kylheku2014-10-041-2/+13
| | | | | | | | | of pretty-printing. Fix object construction bugs. * regex.c (struct regex): New member, source. (regex_mark): Ensure source is visited by garbage collector. (regex_compile): Store regex_sexp in source. Fix violations of section 3.2 of HACKING document.
* Using unified COBJ representation for both regex kinds,Kaz Kylheku2014-10-021-29/+42
| | | | | | | | | | | | | | | | | | | | | | | | rather than the list-based notation for derivative-based regexes, and an encapsulated COBJ for NFA-based regexes. * lib.c (compiled_regex_s): Variable removed. (obj_init): Initialization of compiled_regex_s removed. * lib.h (compiled_regex_s): Declaration removed. * regex.c (struct regex, regex_t): New type. (regex_destroy): Object is now a regex_t, not nfa_t. (regex_mark): New function. (regex_obj_ops): Register regex_mark operation. (reg_nullable, reg_derivative): Remove cases that handles compiled_regex_s. (regex_compile): Output of dv_compile_regex becomes a cobj nwo. Output of nfa_compile_regex must be embedded in regex_t structure. (regexp): Drop the check for compiles_regex_s. (regex_nfa): Function removed. (regex_run, regex_machine_init): Use cobj_handle to retrieve regex_t * pointer and dispatch appropriate code based on regex->kind.
* GC correctness fixes: make sure we pin down objects for which we borrowKaz Kylheku2014-08-251-1/+8
| | | | | | | | | | | | | | | low level C pointers, while we execute code that can cons memory. * lib.c (list_str): Protect the str argument. (int_str): Likewise. * regex.c (search_regex): protect the haystack string, while using the h pointer to its data, since regex_run can use the derivative-based engine which conses. * stream.c (vformat_str): Protect str argument, since put_char might conceivably cons. (vformat): Protect fmtstr.
* * Makefile, arith.c, arith.h, combi.c, combi.h, configure, debug.c,Kaz Kylheku2014-07-231-16/+16
| | | | | | | | debug.h, eval.c, eval.h, filter.c, filter.h, gc.c, gc.h, hash.c, hash.h, lib.c, lib.h, match.c, match.h, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, stream.c, stream.h, syslog.c, syslog.h, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Synchronize license header with LICENSE.
* * eval.c (eval_init): register range_regex and tok_whereKaz Kylheku2014-06-261-0/+13
| | | | | | | | | | | | | | as intrinsics. * lib.c (tok_where): New function. * lib.h (tok_where): Declared. * regex.c (range_regex): New function. * regex.h (range_regex): Declared. * txr.1: Documented tok-where and range-regex.
* * eval.c, gc.c, rand.c, regex.c, signal.c: Remove inclusion of unneededKaz Kylheku2014-04-131-1/+0
| | | | headers.
* * parser.l (regex_parse, lisp_parse): Fix neglected handling ofKaz Kylheku2014-03-141-1/+1
| | | | | | | | optional arguments. This problem can cause the symbol : to be planted as the std_error stream, resulting in an error loop that blows the stack. * regex.c (regex_compile): Likewise.
* Issue: match_regex and search_regex were continuing to feed charactersKaz Kylheku2014-03-091-20/+44
| | | | | | | | | | | | | | | | | | | | to the regex machine even when there is no transition available. This was due to the broken return value protocol of regex_machine_feed. For instance for the regex / +/ (one or more spaces), after matching some spaces, it would report REGM_INCOMPLETE for additional non-space characters, never reporting REGM_FAIL. * regex.c (regm_result_t): Block comment added, documenting protocol. (regex_machine_feed): Return REGM_FAIL if there are no transitions for the given character, even a partial match has been recorded. This is a signal to stop feeding more characters. At that point, the function can be called with a null character to distinguish the three cases: fail, partial or full match. (search_regex): Now when the search loop gets a REGM_FAIL, it can no longer assume that nothing was matched and the search must restart at the next position. Upon the REGM_FAIL signal, it is necesary to seal the search by feeding in the 0 character. Only if that returns REGM_FAIL is it a no match situation. Otherwise it is actually a match!
* Replacing uses of the eq function which are used only as C booleans,Kaz Kylheku2014-02-221-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with just using the == operator. Removing cobj_equal_op since it's indistinguishable from eq. Streamlining missingp and null_or_missing_p. * eval.c (transform_op): eq to ==. (c_var_ops): cobj_equal_op to eq. * filter.c (trie_compress, trie_lookup_feed_char, filter_string_tree, html_hex_continue, html_dec_continue): eq to ==. * hash.c (hash_iter_ops): cobj_equal to eq. * lib.c (countq, getplist, getplist_f, search_str_tree, posq): eq to ==. (cobj_equal_op): Function removed. * lib.h (cobj_equal_op): Declaration removed. (missingp): Becomes a simple macro that yields a C boolean instead of t/nil val, because it's only used that way. (null_or_missing_p): Becomes inline function returning int. * match.c (v_output): eq to ==. * rand.c (random_state_ops): cobj_equal_op to eq. * regex.c (char_set_obj_ops, regex_obj_ops): cobj_equal_op to eq. (reg_derivative): Silly if3 expression replaced by null. (regexp): Redundant if2 expression wrapped around eq removed. * stream.c (null_ops, stdio_ops, tail_ops, pipe_ops, string_in_ops, byte_in_ops, string_out_ops, strlist_out_ops, dir_ops, cat_stream_ops): cobj_equal_op to eq. * syslog.c (syslog_strm_ops): cobj_equal_op to eq.
* The C function nullp is being renamed to null, and the rarelyKaz Kylheku2014-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | used global variable null which holds a symbol becomes null_s. A new macro called nilp is added that more efficiently checks whether an object is nil, producing a C boolean value rather than t or nil. Most of the uses of nullp in the codebase just become the more streamlined nilp. * debug.c (show_bindings): nullp to nilp * eval.c (lookup_var, lookup_var_l, lookup_fun, lookup_sym_lisp1, do_eval, expand_qquote, expand_quasi, expand_op): nullp to nilp. (op_modplace): nullp to null. (eval_init): Update registration of null and not from C function nullp to null. * filter.c (trie_compress, html_hex_continue): nullp to nil. (filter_string_tree): null to null_s. * hash.c (hash_next): nullp to nilp. * lib.c (null): Variable renamed to null_s. (code2type): null to null_s. (lazy_flatten_scan, chainv, lazy_str, lazy_str_force_upto, obj_print, obj_pprint): nullp to nilp. (obj_init): null to null_s; nullp to null. * lib.h (null): declaration changed to null_s. (nullp): Inline function renamed to null. (nilp): New macro. * match.c (do_match_line): nullp to nilp. * rand.c (make_random_state): Likewise. * regex.c (compile_regex): Likewise.
* * arith.c (lognot): Conform to new scheme for defaulting optional args.Kaz Kylheku2014-02-051-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * eval.c (apply): Unconditionally use colon_k for missing optional args, for intrinsic functions. (eval_intrinsic, rangev, rangev_star, errno_wrap): Conform to new scheme for defaulting optional args. (reg_fun_mark): Function removed. (eval_init): Switch reduce_left and reduce_right back to reg_fun registration. * hash.c (gethash_n): Conform to new scheme for defaulting optional arguments. * lib.c (sub_list, replace_list, remove_if, keep_if, remove_if_lazy, keep_if_lazy, tree_find, count_if, some_satisfy, all_satisfy, none_satisfy, search_str, match_str, match_str_tree, sub_str, replace_str, cat_str, tok_str, intern, rehome_sym, sub_vec, replace_vec, lazy_str, sort, multi_sort, find, find_if, set_diff, obj_print, obj_pprint): Conform to new scheme for defaulting optional arguments. (func_f0, func_f1, func_f2, func_f3, func_f4, func_n0, func_n1, func_n2, func_n3, func_n4, func_n5, func_n6, func_n7, func_f0v, func_f1v, func_f2v, func_f3v, func_f4v, func_n0v, func_n1v, func_n2v, func_n3v, func_n4v, func_n5v, func_n6v, func_n7v): Remove references to removed mark_missing_args member of struct func. (func_set_mark_missing): Function removed. (generic_funcall): Unconditionally use colon_k for missing optional args, for intrinsic functions. * lib.h (struct func): mark_missing_args member removed. (func_set_mark_missing): Declaration removed. (default_arg, default_bool_arg): New inline functions. * rand.c (random): Left argument is not optional. (rnd): Conform to new scheme for defaulting optional arguments. * regex.c (search_regex, match_regex): Conform to new scheme for defaulting optional arguments. * stream.c (unget_char, unget_byte, put_string, put_char, put_byte, put_line): Conform to new scheme for defaulting optional arguments. * syslog.c (openlog_wrap): Conform to new scheme for defaulting optional arguments. * txr.1: Remove the specification that nil is a sentinel value in default arguments, where necessary. Use consistent syntax for specifying variable parts in argument lists. A few errors and omissions addressed.
* * regex.c (match_regex_right): Fix not returning value.Kaz Kylheku2014-01-291-0/+2
|
* * regex.c (match_regex_right): Fix semantics of second argumentKaz Kylheku2014-01-271-5/+6
| | | | | | | | to something more useful. * regex.h (match_regex_right): Change name of parameter. * txr.1: Documented match-regex-right.
* * regex.c (match_regex_right): New function.Kaz Kylheku2014-01-261-0/+20
| | | | | | * regex.h (match_regex_right): Declared. * eval.c (eval_init): Register match_regex_right as instrinsic.
* Changes to the list collection mechanism to improveKaz Kylheku2014-01-221-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the extension of list operations over vectors and strings. * eval.c (do_eval_args, bindings_helper, op_each, subst_vars, supplement_op_syms, mapcarv, mappendv): Switch from list_collect_* macros to functions. * lib.c (copy_list): Switch from list_collect* macros to functions. Use list_collect_nconc for the final terminator. Doing a copy there with list_collect_append was actually wasteful, and now that list_collect_append calls copy_list in places, it triggered runaway recursion. (make_like): Bugfix: list_vector was used instead of vector_list. (to_seq, list_collect, list_collect_nconc, list_collect_append): New functions. (append2, appendv, nappend2, sub_list, replace_list, ldiff, remq, remql, remqual, remove_if, keep_if, proper_plist_to_alist, improper_plist_to_alist, split_str, split_str_set, tok_str, list_str, chain, andf, orf, lis_vector, mapcar, mapcon, mappend, merge, set_diff, env): Switch from list_collect* macros to functions. (replace_str, replace_vec): Allow single item replacement sequence. * lib.h (to_seq): Declared. (list_collect, list_collect_nconc, list_collect_append): Macros removed, replaced by function declarations of the same name. These functions return the new ptail since they cannot assign to it, requiring all uses to be updated to do the assignment of the returned value. (list_collect_decl): Use val rather than obj_t *. * match.c (vars_to_bindings, h_coll, subst_vars, extract_vars, extract_bindings, do_output_line, do_output, v_gather, v_collect): Switch from list_collect* macros to functions. * parser.y (o_elems_transform): Likewise. * regex.c (dv_compile_regex, regsub): Likewise. * txr.c (txr_main): Likewise.
* Bugfix in regex char ranges affecting ranges whose upper endKaz Kylheku2014-01-131-4/+7
| | | | | | | | | | | | | corresponds to the high bit of a bitmap cell: for instance the character \x7f when the cell size is 32 bits. * regex.c (BITCELL_ALL1): Unused macro removed. (BITCELL_BIT): New macro to replace occurrences of a repeated expression. (CHAR_SET_INDEX, CHAR_SET_BIT): Updated to use BITCELL_BIT. (L0_fill_range): Bugfix: the mask1 calculation was producing all-zero in the condition bt1 == BITCELL_BIT; it should produce an all-ones mask.
* First cut at signal handling support.Kaz Kylheku2013-12-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Makefile (OBJS-y): Include signal.o if have_posix_sigs is "y". * configure (have_posix_sigs): New variable, set by detecting POSIX signal stuff. * dep.mk: Regenerated. * arith.c, debug.c, eval.c, filter.c, hash.c, match.c, parser.y, parser.l, rand.c, regex.c, syslog.c, txr.c, utf8.c: Include new signal.h header, now required by unwind, and the <signal.h> system header. * eval.c (exit_wrap): New function. (eval_init): New functions registered as intrinsics: exit_wrap, set_sig_handler, get_sig_handler, sig_check. * gc.c (release): Unused functions removed. * gc.h (release): Declaration removed. * lib.c (init): Call sig_init. * stream.c (set_putc, se_getc, se_fflush): New static functions. (stdio_put_char_callback, stdio_get_char_callback, stdio_put_byte, stdio_flush, stdio_get_byte): Use new functions to enable signals when blocked on I/O. (tail_strategy): Allow signals across sleep. (pipev_close): Allow signals across waitpid. (se_pclose): New static function. (pipe_close): Use new function to enable signals across pclose. * unwind.c (uw_unwind_to_exit_point): use extended_longjmp instead of longjmp. * unwind.h (struct uw_block, struct uw_catch): jb member changes from jmp_buf to extended_jmp_buf. (uw_block_begin, uw_simple_catch_begin, uw_catch_begin): Use extended_setjmp instead of setjmp. * signal.c: New file. * signal.h: New file.
* Bumping copyrights to 2014 and expressing them as year ranges.Kaz Kylheku2013-12-101-1/+1
| | | | Fixing some errors in copyright comments.
* * eval.c (eval_init): Update registration of regex-compileKaz Kylheku2013-12-061-3/+3
| | | | | | | | | | | | | | | | | | | to reflect that it has two arguments now. * parser.y (grammar): Update calls to regex_compile to pass two arguments. Since we don't expect regex_compile to parse, we specify the error stream as nil. (spec): The "secret syntax" for a regex is simplified not to include the slashes. This provides better diagnostics for unterminated syntax and requires less string processing to generate. Also, the form returned doesn't have the regex symbol consed onto it, which parse_regex just throws away. * regex.c (regex_compile): Now takes a stream argument. * regex.h (regex_compile): Declaration updated. * txr.1: Updated
* * regex.c (regex_compile): Handle string input.Kaz Kylheku2013-12-051-1/+5
| | | | | | | * regex.h (regex_compile): Don't call argument regex_sexp, since it can be a string. * txr.1: Updated.
* * regex.c (regex_space_chars): Variable removed.Kaz Kylheku2012-04-201-22/+16
| | | | | | | | | (char_set_addr_str): New function. (char_set_compile): Use char_set_addr_str to add spaces to set. (init_special_char_sets): Use char_set_addr_str to add spaces to set. Bugfix: word_cs, cword_cs wrongly initialized. (regex_init): Removed reference to regex_space_chars.
* * parser.y (regtoken): New nonterminal symbol.Kaz Kylheku2012-04-201-1/+30
| | | | | | | | | | | | | | | | (regterm): REGTOKEN production factored out to regtoken. (regclass): Reverted prior commmit's changes. (regclassterm): Reverted prior commit, removing REGTOKEN production for character classes, and introduced a regtoken production. So now the keyword symbols are part of the character class abstract syntax. (regtoken): New production rule. * regex.c (regex_space_chars): Converted to internal linkage. (char_set_compile): Handle token keywords in character class abstract syntax. * regex.h (regex_space_chars): External declaration removed.
* First cut at implementing \s, \d, \w, \S, \D and \W regex tokens.Kaz Kylheku2012-04-191-3/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * lib.c (init): Call regex_init. * parser.l: return new REGTOKEN kind. * parser.y (REGTOKEN): New token type. (REGTERM): Translate REGTERM to keyword. (regclass): Restructured to handle inherited nodes as lists. (regclassterm): Produce $$ as list. Add handling for REGTOKEN occurring inside character class by expanding it. This might not be the best approach. (yybadtoken): Handle REGTOKEN in switch. * regex.c (struct any_char_set, struct small_char_set, struct displaced_char_set, struct large_char_set, struct xlarge_char_set): New bitfield member, stat. (char_set_create): New parameter for indicating static char set. (char_set_destroy): Do not free a static char set. (char_set_compile): Pass zero to new parameter of char_set_create. (spaces): New static array. (space_cs, digit_cs, word_cs, cspace_cs, cdigit_cs, cword_cs): New static pointers to char_set_t. (init_special_char_sets, nfa_compile_given_set): New static function. (nfa_compile_regex, dv_compile_regex): Handle new character set token keywords. (space_k, digit_k, word_char_k, cspace_k, cdigit_k, cword_char_k, regex_space_chars): New variables. (regex_init): New function. * regex.h (space_k, digit_k, word_char_k, cspace_k, cdigit_k, cword_char_k, regex_space_chars, regex_init): Declared.
* Improve the regex Lisp syntax by allowing strings to specifyKaz Kylheku2012-04-121-4/+12
| | | | | | | | | | | character compounds. I.e. the syntax "foo" is equivalent to the cumbersome canonical form (compound #\f #\o #\o). * regex.c (nfa_compile_regex, dv_compile_regex): Use chrp function instead of typeof. Handle stringp case by forming a compound out of the characters and recursing. Check for some bad objects in the regex that would never come out of our regex parser but could occur in a "hand crafted" syntax tree.