summaryrefslogtreecommitdiffstats
path: root/match.c
Commit message (Collapse)AuthorAgeFilesLines
* Regression bug fix: longest match variables broken byKaz Kylheku2011-10-011-1/+1
| | | | | | | | 2011-09-28 commit which introduced the double var match. * match.c (match_line): Handle case where modifier is t. * parser.y (var_op): Produce modifir as (t) rather than t.
* New directive: choose.Kaz Kylheku2011-10-011-12/+99
| | | | | | | | | | | | | | | | | | | | | | * match.c (choose_s, longest_k, shortest_k): New variables. (match_line, match_files): Introduced choose directive. (match_init): Initialize new variables. * match.h (choose_s): Declared. * parser.l (yybadtoken): Handle CHOOSE. (CHOOSE): Clause added for returning this token. * parser.y: Added #include "match.h". (CHOOSE): New token symbol. (choose_clause): New nonterminal symbol. (clause): choose_clause added. (all_clause, some_clause, none_clause, maybe_clause, cases_clause): Abstract syntax tree tweaked. (choose_clause): New syntax. (elem): Abstract syntax trees tweaked for many clauses. New CHOOSE clauses. (out_clause): New error case for choose_clause.
* * match.c (match_line): Implemented horizontal all, some,Kaz Kylheku2011-09-291-3/+49
| | | | | | | | | | none, maybe and cases directives. (match_files): Recognize horizontal version of these directives by the presence of the extra symbol t and do not process. Also, bugfix in the all directive: not resetting the all_match flag when short circuiting out. * parser.y (clause_parts_h, additional_parts_h): New nonterminals. (elem): New clauses added.
* * match.c (chars_k): New variable.Kaz Kylheku2011-09-291-33/+65
| | | | | | | | | | | | (match_line): Keyword arguments in coll implemented. (match_init): chars_k variable initialized. * parser.l (COLL): Lexical syntax changed to allow for argument material. * parser.y (elem): Coll syntax rewritten for arguments. * txr.1: Updated.
* * match.c (mingap_k, maxgap_k, gap_k, times_k, lines_k): NewKaz Kylheku2011-09-291-41/+91
| | | | | | | | | | | | | | | | | symbol variables. (match_lines): Keyword arguments in collect implemented. (match_init): New function. * match.h (match_init): Declared. * parser.l (COLLECT): Lexical syntax changed for COLLECT to allow for argument material. * parser.y (%union): obj renamed to val. (exprs_opt): New nonterminal. (collect_clause): Rewritten for arguments. * txr.c (main): Call to match_init introduced.
* * match.c (match_line): Bugfix in double var. Do notKaz Kylheku2011-09-281-2/+4
| | | | prepend the next_pat to the specline if it is nil.
* * match.c (match_line): Logic restructured to allow forKaz Kylheku2011-09-281-35/+76
| | | | | | | | | | | | | | | | | | regex variables which also have nested variables. Previously this code was assuming that the cases were mutually exclusive, and the parser happened to work that way. Also, added support for a "double var" match which occurs when an unbound variable is followed by a regex variable. This case should be allowed because it makes sense. It's similar to a variable followed by a regex, except that the regex is also a variable binding. * parser.y (o_elems_transform): New function. (o_elems_opt, o_elems_opt2, quasilit): Transform o_elems with new function. This is needed because subst_vars doesn't deal with the nested var syntax for consecutive variables. (var): New syntax case '{' IDENT exprs '}' elem. This allows consecutive variables to be nested in all cases.
* * match.c (match_files): One more fix to this, argh.Kaz Kylheku2011-09-271-8/+8
| | | | | | | | | | | | | The test for !data should be done after matching, before incrementing to the next line. Then it is a true bottom of the loop test. This commit allows @(skip) @first_line @(skip nil 3) @(eof) to correctly match the first line of the input, not the fourth one from the bottom, since the second skip has an unbounded range.
* * match.c (match_files): Another bugfix to skip.Kaz Kylheku2011-09-271-0/+7
| | | | | | | If a hard skip tries to go beyond EOF, then the query must fail. However, a skip to exactly EOF is fine. I.e. data can hit nil at the same time as the right number of skip iterations is performed.
* * match.c (match_files): Bugfix in skip directive.Kaz Kylheku2011-09-271-1/+4
| | | | | | | | | | | | We should try the match at least once even if there is no data after a hard skip, so that the query has an opportunity to do an explicit match for no data, as with @(endp). This commit makes possible queries like: @fourth_line_from_bottom @(skip 1 3) @(eof) This query depends on @(skip 1 3) not failing when it runs out of data, because @(eof) checks for htis.
* * lib.c (eof_s): New symbol variable.Kaz Kylheku2011-09-271-0/+8
| | | | | | | | | | | (obj_init): New variable initialized. * lib.h (eof_s): Declared. * match.c (match_files): New @(eof) directive explicitly matches end of data. * txr.1: Updated.
* Support &#xNNNN; hex escapes in html. Bugfix in field formatting.Kaz Kylheku2011-09-261-1/+2
| | | | | | | | | | | | | | | | | | | chr function inlined. * filter.c (trie_value_at, trie_lookup_feed_char): Handle function case. (build_filter): New parameter, compress_p. (html_hex_continue, html_hex_handler): New functions. (filter_init): Add a function-based node to the from_html trie. * lib.c (chr): Function removed. (functionp) New function. * lib.h (chr): Declaration replaced with inline function. (functionp): Declared. * match.c (format_field): Bugfix: failed to apply filter that came in as an argument.
* New feature: @(deffilter)Kaz Kylheku2011-09-261-7/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | Bugfix in @(throw) when non-symbol is thrown: exception message referred to the symbol throw rather than the erroneous object. * filter.c (build_filter_from_list, register_filter): New functions. * filter.h (register_filter): New function declared. * lib.c (deffilter_s): New variable defined. (chain): Function changed from single list argument to variable argument list to reduce the complexity of use. (do_and, and): New functions. (obj_init): deffilter_s initializatio added. * lib.h (deffilter_s, and): New declarations. (chain): Declaration updated to new function signature. (eq): Changed from macro to inline function. * match.c (do_output_line): Simplified expression involving chain. (do_output): Likewise. (match_files): Bugfix in error handling of throw. Implementation of deffilter. * txr.1: Documented deffilter.
* Filtering feature for variable substitution in output.Kaz Kylheku2011-09-251-33/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * filter.c, filter.h: New files. * Makefile (OBJS): filter.o added. * gc.c (mark_obj): Mark new alloc field of string objets. * hash.c (struct hash): New member, userdata. (hash_mark): Mark new userdata member of hash. (make_hash): Initialize userdata. (get_hash_userdata, set_hash_userdata, hashp): New functions. * hash.h (get_hash_userdata, set_hash_userdata, hashp): New functions declared. * lib.c (getplist, string_extend, cobjp): New functions. (string_own, string, string_utf8): Initialize new alloc field to nil. (mkstring, mkustring): Initialize new alloc field to actual size. (length_str): When length is computed and cached, also compute and cache alloc. (init): Call filter_init. * lib.h (string string): New member, alloc. (num_fast): Macro converted to inline function. (getplist, string_extend, cobjp): New functions declared. * match.c (match_line): Follows change of modifier s-exp syntax. (format_field): New parameter, filter. New modifier syntax parsed. Filter retrieved, and applied. (subst_vars): New parameter, filter. Filter is either applied in this function or passed to format_field, as needed. (eval_form): Pass nil to new parameter of subst_vars. (do_output_line): New parameter, filter. Passed down to subst_vars. (do_output): New parameter, filter. Passed down to do_output_line. (match_files): Pass nil filter to subst_vars in cat directive. Output directive refactored to parse keywords, extract the filter and pass down to do_output. * parser.y (regex): Generate (sys:regex regex syntax ...) instead of (regex syntax ...). (elem, expr): Updated w.r.t. regex syntax change. (var): Cases '{' IDENT regex '}' and '{' IDENT NUMBER '}' are removed. new syntax '{' IDENT exprs '}' to handle these more generally and allow for keywords. * txr.1: Updated.
* * LICENSE, Makefile, configure, gc.c, gc.h, hash.c, hash.h, lib.c,Kaz Kylheku2011-09-231-1/+1
| | | | | | lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated copyright year.
* * match.c, parser.y: Support for old output syntax removed.Kaz Kylheku2011-09-231-18/+14
| | | | | | Leading :nothrow with trailing material is an error now. * txr.1: Updated. Made note of errors in pipes being asynchronous.
* * match.c (match_files): Some cleanup in preparation of newKaz Kylheku2011-09-231-28/+27
| | | | features. Support for obsolescent @(next) syntax is gone.
* Semantics tweak: short circuiting behavior for @(all) and @(none).Kaz Kylheku2011-09-231-1/+3
| | | | | | * match.c (match_files): Added a couple of break statements. * txr.1: Updated.
* Useful second argument in skip directive for skippingKaz Kylheku2011-09-221-2/+16
| | | | | | | | a minimum number of lines. * match.c (match_files): New behavior in skip_s case. * txr.1: Documented.
* Bump copyrights to 2010.Kaz Kylheku2010-10-051-1/+1
|
* * match.c (match_files): Bugfix. A (sub)query that runs out of dataKaz Kylheku2010-10-041-0/+3
| | | | | lines to match must fail. Extra data lines relative to the spec are tolerated; extra spec lines unmet by data aren't.
* * match.c (match_lines): Bugfix in freeform directive.Kaz Kylheku2010-02-271-1/+3
| | | | | | | | | | If the virtual line is partially matched, the remainder of the line is folded back into list form. In this case, the data line number must be incremented. Otherwise the calling context may conclude that no progress was made, and skip a line of input. I.e. the unmatched part of the input is a new line, even if there had originally been no line break at that point.
* Version 032.Kaz Kylheku2010-01-251-0/+1
|
* Fix screwup in previous change: value treated as a consKaz Kylheku2010-01-251-2/+1
| | | | in a code path where it sometimes isn't.
* * match.c (match_files): Workaround for GC issueKaz Kylheku2010-01-241-0/+2
| | | | | | | discovered on Red Hat EL 4 with gcc 3.4.3. In the collect loop, set car(success) to nil. Somehow the generated code hangs on to the last matching position for a regex, preventing GC.
* Fix for unbounded memory growth problem reproduced with GCC 4.4.1Kaz Kylheku2010-01-211-0/+2
| | | | | | on 32 bit x86 Fedora. This happens because the lazy list variable ``data'' in the match_files function is optimized to a register, but a stale value of that variable persists in the backing storage.
* * match.c (match_files): Reduce scope, and bogus use of, datalineKaz Kylheku2010-01-211-6/+2
| | | | variable.
* Version 028.Kaz Kylheku2010-01-161-1/+1
|
* Impelement derivative-based regular expressions.Kaz Kylheku2010-01-131-2/+2
|
* Code cleanup. All private functions static. Private stuffKaz Kylheku2009-11-281-58/+29
| | | | in regex module not exposed in header. Etc.
* Switching to keyword symbols for :args and :nothrow.Kaz Kylheku2009-11-241-11/+11
|
* Changes to make the code portable to C++ compilers, whichKaz Kylheku2009-11-241-1/+3
| | | | can be taken advantage of for better diagnostics.
* Renaming global variables that denote symbols, such that theyKaz Kylheku2009-11-241-55/+54
| | | | have a _s suffix.
* Improving portability. It is no longer assumed that pointersKaz Kylheku2009-11-231-7/+8
| | | | | | | | can be converted to a type long and vice versa. The configure script tries to detect the appropriate type to use. Also, some run-time checking is performed in the streams module to detect which conversions specifier strings to use for printing numbers.
* Introducing symbol packages. Internal symbols are now inKaz Kylheku2009-11-211-40/+50
| | | | | | | | | | a system package instead of being hacked with the $ prefix. Keyword symbols are provided. In the matcher, evaluation is tightened up. Keywords, nil and t are not bindeable, and errors are thrown if attempts are made to bind them. Destructuring in dest_bind is strict in the number of items. String streams are exploited to print bindings to objects that are not strings or characters. Numerous bugfixes.
* * unwind.c (uw_throw): If streams are not initialized,Kaz Kylheku2009-11-201-0/+1
| | | | | | we have an unhandled exception too early in initialization. Use C stream to print an error message and abort. Using the nil stream variable will just cause a recursion bomb.
* * match.c (dest_bind): Extended to handle more general formsKaz Kylheku2009-11-201-15/+12
| | | | | | | by using eval_form rather than direct symbol binding lookups. False positive return fixed. (match_line): Fixed merge to use eval_from rather than direct symbol binding.
* Changing ``obj_t *'' occurences to a ``val'' typedef. (Ideally,Kaz Kylheku2009-11-201-228/+228
| | | | | we wouldn't have to declare object variables at all, so why use an obtuse syntax to do so?)
* * match.c (match_line, match_files): Fix broken behavior of collectKaz Kylheku2009-11-181-6/+2
| | | | that doesn't match anything.
* More removal of C99 wide character I/O, and tightening upKaz Kylheku2009-11-171-18/+24
| | | | of standard conformance.
* Warning fixes.Kaz Kylheku2009-11-171-1/+1
|
* Fixes for compliance to C89.Kaz Kylheku2009-11-171-1/+2
|
* Big round of changes to switch the code base to use the streamKaz Kylheku2009-11-161-122/+139
| | | | | | | | | | | | | | | | | abstraction instead of directly using C standard I/O, to eliminate most uses of C formatted I/O, and fix numerous bugs, such variadic argument lists which lack a terminating ``nao'' sentinel. Bug 28033 is addressed by this patch, since streams no longer provide printf-compatible formatting. The native formatter is extended with some additional capabilities to take over. The work on literal objects is expanded and they are now used throughout the code base. Fixed bad realloc in string output stream: reallocating by number of wide chars rather than bytes.
* Continuing wchar_t conversion. Making sure all stdio callsKaz Kylheku2009-11-121-9/+9
| | | | | use wide character functions so that there is no illicit mixing. (But the goal is to replace this usage with txr streams).
* Big conversion to wide characters and UTF-8 support.Kaz Kylheku2009-11-111-114/+119
| | | | | | | | | This is incomplete. There are too many dependencies on wide character support from the C stream I/O library, and implicit use of some encoding which may not be UTF-8. The regex code does not handle wide characters properly. Character type is still int in some places, rather than wchar_t. Test suite passes though.
* Changing representation of objects to allow the NUM type to beKaz Kylheku2009-11-091-3/+3
| | | | | | | | unboxed. If the lowest bit of the obj_t * pointer is 1, then the remaining bits are a number. A lot of assumptions are made: - the long type can be converted to and from a pointer - two's complement. - behavior of << and >> operators when the sign bit is involved.
* Throw exception on stream error during close, or I/O operations. ThisKaz Kylheku2009-11-061-2/+2
| | | | | | is needed for pipes that terminate abnormally or return failed termination. Pipe and stdio streams have an extra description field so they are printed in a readable way.
* Version 019txr-019Kaz Kylheku2009-11-031-4/+4
| | | | | | Regexps can be bound to variables. New freeform directive.
* Change the freeform line catenation semantics to terminationKaz Kylheku2009-11-031-3/+3
| | | | rather than separation.
* Got regex working over lazy strings from freeform.Kaz Kylheku2009-11-021-7/+5
| | | | Bugfixes.