summaryrefslogtreecommitdiffstats
path: root/regex.h
Commit message (Collapse)AuthorAgeFilesLines
* * parser.y (regtoken): New nonterminal symbol.Kaz Kylheku2012-04-201-1/+0
| | | | | | | | | | | | | | | | (regterm): REGTOKEN production factored out to regtoken. (regclass): Reverted prior commmit's changes. (regclassterm): Reverted prior commit, removing REGTOKEN production for character classes, and introduced a regtoken production. So now the keyword symbols are part of the character class abstract syntax. (regtoken): New production rule. * regex.c (regex_space_chars): Converted to internal linkage. (char_set_compile): Handle token keywords in character class abstract syntax. * regex.h (regex_space_chars): External declaration removed.
* First cut at implementing \s, \d, \w, \S, \D and \W regex tokens.Kaz Kylheku2012-04-191-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * lib.c (init): Call regex_init. * parser.l: return new REGTOKEN kind. * parser.y (REGTOKEN): New token type. (REGTERM): Translate REGTERM to keyword. (regclass): Restructured to handle inherited nodes as lists. (regclassterm): Produce $$ as list. Add handling for REGTOKEN occurring inside character class by expanding it. This might not be the best approach. (yybadtoken): Handle REGTOKEN in switch. * regex.c (struct any_char_set, struct small_char_set, struct displaced_char_set, struct large_char_set, struct xlarge_char_set): New bitfield member, stat. (char_set_create): New parameter for indicating static char set. (char_set_destroy): Do not free a static char set. (char_set_compile): Pass zero to new parameter of char_set_create. (spaces): New static array. (space_cs, digit_cs, word_cs, cspace_cs, cdigit_cs, cword_cs): New static pointers to char_set_t. (init_special_char_sets, nfa_compile_given_set): New static function. (nfa_compile_regex, dv_compile_regex): Handle new character set token keywords. (space_k, digit_k, word_char_k, cspace_k, cdigit_k, cword_char_k, regex_space_chars): New variables. (regex_init): New function. * regex.h (space_k, digit_k, word_char_k, cspace_k, cdigit_k, cword_char_k, regex_space_chars, regex_init): Declared.
* Bug #35718. Workaround good enough to get some code working.Kaz Kylheku2012-03-041-1/+1
| | | | | | | | | | | | * eval.c (cons_find): New function. (expand_op): Use cons_find rather than tree_find to look for rest_gensym. * regex.c (regsub): Rearranged arguments so that the string is last. This is better for partial evaluaton via the op operator. * regex.h (regsub): Updated declaration.
* * eval.c (eval_init): New intrinsic function, regsub.Kaz Kylheku2012-03-041-0/+1
| | | | | | | | * regex.c (regsub): New function. * regex.h (regsub): Declared. * txr.1: Doc stub added.
* * arith.c: Updated copyright year.Kaz Kylheku2012-02-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * arith.h: Likewise. * debug.c: Added copyright header. * debug.h: Updated copyright year. * eval.c: Likewise. * eval.h: Likewise. * filter.c: Likewise. * filter.h: Likewise. * gc.c: Likewise. * gc.h: Likewise. * hash.c: Likewise. * hash.h: Likewise. * lib.c: Likewise. * lib.h: Likewise. * match.c: Likewise. * match.h: Likewise. * parser.h: Likewise. * regex.c: Likewise. * regex.h: Likewise. * stream.c: Likewise. * stream.h: Likewise. * txr.c: Likewise, and e-mail address. * txr.h: Updated copyright year. * unwind.c: Likewise. * unwind.h: Likewise.
* We don't include headers in headers in this project.Kaz Kylheku2011-10-301-2/+0
| | | | | | | | * parser.h: Do not include <stdio.h> * regex.c: Include <limits.h> * regex.h: Do not include <limits.h>
* * LICENSE, Makefile, configure, filter.c, filter.h, gc.c, gc.h, hash.c,Kaz Kylheku2011-10-041-1/+1
| | | | | | hash.h, lib.c, lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated e-mail address.
* * LICENSE, Makefile, configure, gc.c, gc.h, hash.c, hash.h, lib.c,Kaz Kylheku2011-09-231-1/+1
| | | | | | lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated copyright year.
* Bump copyrights to 2010.Kaz Kylheku2010-10-051-1/+1
|
* Impelement derivative-based regular expressions.Kaz Kylheku2010-01-131-22/+0
|
* Code cleanup. All private functions static. Private stuffKaz Kylheku2009-11-281-115/+1
| | | | in regex module not exposed in header. Etc.
* Changes to make the code portable to C++ compilers, whichKaz Kylheku2009-11-241-5/+5
| | | | can be taken advantage of for better diagnostics.
* Improving portability. It is no longer assumed that pointersKaz Kylheku2009-11-231-4/+4
| | | | | | | | can be converted to a type long and vice versa. The configure script tries to detect the appropriate type to use. Also, some run-time checking is performed in the streams module to detect which conversions specifier strings to use for printing numbers.
* Changing ``obj_t *'' occurences to a ``val'' typedef. (Ideally,Kaz Kylheku2009-11-201-7/+6
| | | | | we wouldn't have to declare object variables at all, so why use an obtuse syntax to do so?)
* Fixes for compliance to C89.Kaz Kylheku2009-11-171-10/+10
|
* Regular expression module updated to do unicode character sets.Kaz Kylheku2009-11-121-12/+57
| | | | | | | | | | | Most of the changes are in the area of representing sets. Also, a bug was found in the compilation of regex character sets: ranges straddling two adjacent blocks of 32 characters were not being added to the character set. However, ranges falling within a single 32 block, or spanning three or more such blocks, worked properly. This bug is not tickled by common ranges such as A-Z, or 0-9, which land within a 32 block.
* Big conversion to wide characters and UTF-8 support.Kaz Kylheku2009-11-111-1/+1
| | | | | | | | | This is incomplete. There are too many dependencies on wide character support from the C stream I/O library, and implicit use of some encoding which may not be UTF-8. The regex code does not handle wide characters properly. Character type is still int in some places, rather than wchar_t. Test suite passes though.
* Version 019txr-019Kaz Kylheku2009-11-031-3/+3
| | | | | | Regexps can be bound to variables. New freeform directive.
* Got regex working over lazy strings from freeform.Kaz Kylheku2009-11-021-2/+5
| | | | Bugfixes.
* Start of implementation for freestyle matching.Kaz Kylheku2009-11-021-4/+17
| | | | | | | | | | | Lazy strings implemented, incompletely. Changed string function to implicitly strdup; non-strdup version changed to string_own. Fixed wrong uses of strdup rather than chk_strdup. Functions added to regex module to provide regex matching as a state machine to which characters are fed.
* Trivial change allows regexps to be bound to variables,Kaz Kylheku2009-10-301-0/+1
| | | | | and used for matching. This Just Works because of the way match_line treats variables.
* txr-011 2009-09-25txr-011Kaz Kylheku2017-07-311-0/+107