summaryrefslogtreecommitdiffstats
path: root/parser.l
Commit message (Collapse)AuthorAgeFilesLines
* * Makefile (%.ok: %.txr): Use unified diff for showingKaz Kylheku2011-10-131-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | differences between expected and actual test output. * parser.l (yybadtoken): Handle new terminal symbol, SPACE. New rule for producing SPACE token out of an extent of tabs and spaces. * parser.y (SPACE): New terminal symbol. (o_var): New nonterminal. I noticed that the var rule was being used for output elements, and the var rule refers to elem rather than o_elem. A new o_var rule is a simplified duplicate of var. (elem): Handle SPACE token. Transform to regex if it is a single space, otherwise to literal text. (o_elem): Handle SPACE token in output. * tests/001/query-2.txr: This query depends on matching single spaces and so needs to use escapes. * tests/001/query-4.txr, test/001/query-4.expected: New test case, based on query-2.txr. It produces the same output, but is simpler thanks to the new semantics of space. * txr.1: Documented.
* Ported to Cygwin.Kaz Kylheku2011-10-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TODO: there should be some type safety with the new wli macro so that if it is forgotten, there will be a diagnostic. * configure (lit_align): New configuration variable and configuration test. Generates LIT_ALIGN in config.h. Fixed the integer-holds-pointer test for the different output from the nm program on Cygwin. The arrays become common symbols marked C which do not show an offset attribute, only size: one less column. * filter.c (to_html_table, from_html_table): wrap wide string literals with the wli macro. This must be done from now on for all literals and initializes of arrays that are going to be directly converted to type tagged val-s. * lib.h (wli): New macro. (auto_str, static_str, litptr, lit_noex): Handle wide literals on platforms where they are aligned to only two bytes, such that we don't have two bits in the pointer. We can still add our 11 bit type tag, but then when recovering the pointer to the data, we have may have to fix up the pointer. * parser.l: Another portability issue here. Flex generates a scanner which has #include <unistd.h> in the middle, after the source file's own #includes which can introduce macros. On Cygwin, there is some hygiene problem whereby our "noreturn" macro causes the <unistd.h> header to generate bad syntax and fail to compile. Stupid Cygwin and even stupider flex! The workaround is to include <unistd.h> at the top in the flex source. * stream.c (string_out_put_char): This is one more place where the string literal handling hack spreads. * txr.c (version): Wrap string in wli.
* Extending syntax to allow for @VAR and @(...) forms insideKaz Kylheku2011-10-061-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | nested lists. This is in anticipation of future features. * lib.c (expr_s): New symbol variable. (obj_init): expr_s initialized. * lib.h (expr_s): Declared. * match.c (dest_bind): Now takes linenum. Tests for the meta-syntax denoted by the system symbols var_s and expr_s, and throws an error. (eval_form): Similar error checks added. Also, hack: do not add file and line number to an exception which begins with a '(' character; just re-throw it. This suppresses duplicate line number addition when this throw occurs across some nestings. (match_files): Updated calls to dest_bind. * parser.l (yybadtoken): Handle new token kind, METAVAR and METAPAR. (grammar): Refactoring among patterns: TOK broken into SYM and NUM, NTOK introduced, unused NUM_END removed. Rule for @( producing METAPAR in nested state. * parser.y (METAVAR, METAPAR): New tokens. (meta_expr): New nonterminal. (expr): meta_expr and META_VAR productions handled.
* * LICENSE, Makefile, configure, filter.c, filter.h, gc.c, gc.h, hash.c,Kaz Kylheku2011-10-041-1/+1
| | | | | | hash.h, lib.c, lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated e-mail address.
* New directive: choose.Kaz Kylheku2011-10-011-0/+7
| | | | | | | | | | | | | | | | | | | | | | * match.c (choose_s, longest_k, shortest_k): New variables. (match_line, match_files): Introduced choose directive. (match_init): Initialize new variables. * match.h (choose_s): Declared. * parser.l (yybadtoken): Handle CHOOSE. (CHOOSE): Clause added for returning this token. * parser.y: Added #include "match.h". (CHOOSE): New token symbol. (choose_clause): New nonterminal symbol. (clause): choose_clause added. (all_clause, some_clause, none_clause, maybe_clause, cases_clause): Abstract syntax tree tweaked. (choose_clause): New syntax. (elem): Abstract syntax trees tweaked for many clauses. New CHOOSE clauses. (out_clause): New error case for choose_clause.
* * parser.l: Implemented backslash continuations in SPECIALKaz Kylheku2011-09-301-13/+27
| | | | | | state, regexes and string literals. * txr.1: Documented.
* * match.c (chars_k): New variable.Kaz Kylheku2011-09-291-2/+2
| | | | | | | | | | | | (match_line): Keyword arguments in coll implemented. (match_init): chars_k variable initialized. * parser.l (COLL): Lexical syntax changed to allow for argument material. * parser.y (elem): Coll syntax rewritten for arguments. * txr.1: Updated.
* * match.c (mingap_k, maxgap_k, gap_k, times_k, lines_k): NewKaz Kylheku2011-09-291-2/+2
| | | | | | | | | | | | | | | | | symbol variables. (match_lines): Keyword arguments in collect implemented. (match_init): New function. * match.h (match_init): Declared. * parser.l (COLLECT): Lexical syntax changed for COLLECT to allow for argument material. * parser.y (%union): obj renamed to val. (exprs_opt): New nonterminal. (collect_clause): Rewritten for arguments. * txr.c (main): Call to match_init introduced.
* Bugfixes: Consistent escaping in various literals. DoubleKaz Kylheku2011-09-261-6/+10
| | | | | | | | | | backslash codes for single backslash. Output clause can be empty. * parser.l (char_esc): Backslash handled. Use internal_error rather than abort. (REGCHAR, LITCHAR): Backslash added to lexical syntax. * parser.y (output_clause): Allow empty output clause.
* * LICENSE, Makefile, configure, gc.c, gc.h, hash.c, hash.h, lib.c,Kaz Kylheku2011-09-231-1/+1
| | | | | | lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated copyright year.
* Bump copyrights to 2010.Kaz Kylheku2010-10-051-1/+1
|
* Resolving parser conflicts.Kaz Kylheku2010-01-191-1/+1
|
* Implemented non-greedy operator.Kaz Kylheku2010-01-151-1/+1
|
* Bugfix: allow unescaped / to be used in regex character classes.Kaz Kylheku2010-01-131-5/+9
|
* Implemented the regular expression ~ and & operators.Kaz Kylheku2010-01-051-4/+4
| | | | | | | | | | | | | | | This turns out to be easy to do in NFA land. The complement of an NFA has exactly the same number and configuration of states and transitions, except that the states have an inverted meaning; and furthermore, failed character transitions are routed to an extra state (which in this impelmentation is permanently allocated and shared by all regexes). The regex & is implemented trivially using DeMorgan's. Also, bugfix: regular expressions like A|B|C are allowed now by the syntax, rather than constituting syntax error. Previously, this would have been entered as (A|B)|C.
* Remove unnecessary cast.Kaz Kylheku2009-12-091-1/+1
|
* * parser.l (YYINPUT): Fix signed/unsigned comparison.Kaz Kylheku2009-12-091-2/+3
|
* * parser.l (YY_NO_UNPUT): Removed superfluous #define. This is notKaz Kylheku2009-12-031-2/+0
| | | | | | | needed because suppressing generation of unput is requested via the %option. In scanners generated by the legacy version of flex, 2.5.4, still widely in use. this redundancy leads to a multiple #define YY_NO_UNPUT and a compiler warning.
* * parser.l: Use flex options to suppress generation of theKaz Kylheku2009-11-281-0/+2
| | | | | unused functons yyunput and yyinput, thus getting rid of some compiler diagnostics.
* Fixed broken yyerrorf. It was still taking char *, and passingKaz Kylheku2009-11-241-1/+1
| | | | | that as an object to vformat, resulting in #<garbage: ...> output.
* Improving portability. It is no longer assumed that pointersKaz Kylheku2009-11-231-3/+7
| | | | | | | | can be converted to a type long and vice versa. The configure script tries to detect the appropriate type to use. Also, some run-time checking is performed in the streams module to detect which conversions specifier strings to use for printing numbers.
* Introducing symbol packages. Internal symbols are now inKaz Kylheku2009-11-211-4/+11
| | | | | | | | | | a system package instead of being hacked with the $ prefix. Keyword symbols are provided. In the matcher, evaluation is tightened up. Keywords, nil and t are not bindeable, and errors are thrown if attempts are made to bind them. Destructuring in dest_bind is strict in the number of items. String streams are exploited to print bindings to objects that are not strings or characters. Numerous bugfixes.
* Changing ``obj_t *'' occurences to a ``val'' typedef. (Ideally,Kaz Kylheku2009-11-201-4/+4
| | | | | we wouldn't have to declare object variables at all, so why use an obtuse syntax to do so?)
* Fix total breakage of yyerror and yyerrorf.Kaz Kylheku2009-11-181-2/+3
|
* More removal of C99 wide character I/O, and tightening upKaz Kylheku2009-11-171-3/+3
| | | | of standard conformance.
* Big round of changes to switch the code base to use the streamKaz Kylheku2009-11-161-45/+47
| | | | | | | | | | | | | | | | | abstraction instead of directly using C standard I/O, to eliminate most uses of C formatted I/O, and fix numerous bugs, such variadic argument lists which lack a terminating ``nao'' sentinel. Bug 28033 is addressed by this patch, since streams no longer provide printf-compatible formatting. The native formatter is extended with some additional capabilities to take over. The work on literal objects is expanded and they are now used throughout the code base. Fixed bad realloc in string output stream: reallocating by number of wide chars rather than bytes.
* Previous commit broke UTF-8 lexing, by changing the get_charKaz Kylheku2009-11-131-2/+2
| | | | | | | | semantics on the input stream to wide character input. Also, reading a query the command line (-c) must read bytes from a UTF-8 encoding of the string. We introduce a new get_byte function which can extract bytes from streams which provide it.
* Continuing wchar_t conversion. Making sure all stdio callsKaz Kylheku2009-11-121-41/+41
| | | | | use wide character functions so that there is no illicit mixing. (But the goal is to replace this usage with txr streams).
* Documenting extended characters in man page.Kaz Kylheku2009-11-121-0/+15
| | | | Cleaned up some more issues related to extended characters.
* Big conversion to wide characters and UTF-8 support.Kaz Kylheku2009-11-111-34/+50
| | | | | | | | | This is incomplete. There are too many dependencies on wide character support from the C stream I/O library, and implicit use of some encoding which may not be UTF-8. The regex code does not handle wide characters properly. Character type is still int in some places, rather than wchar_t. Test suite passes though.
* Kill tabs with spaces (how did they sneak in?).Kaz Kylheku2009-11-041-12/+12
| | | | Fix possible use of uninitialized ch.
* txr-015 2009-10-15txr-015Kaz Kylheku2017-07-311-0/+523