txr - TXR: A data munging language.

	Commit message (Collapse)	Author	Age	Files	Lines
*	* parser.y (regtoken): New nonterminal symbol.	Kaz Kylheku	2012-04-20	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(regterm): REGTOKEN production factored out to regtoken. (regclass): Reverted prior commmit's changes. (regclassterm): Reverted prior commit, removing REGTOKEN production for character classes, and introduced a regtoken production. So now the keyword symbols are part of the character class abstract syntax. (regtoken): New production rule. * regex.c (regex_space_chars): Converted to internal linkage. (char_set_compile): Handle token keywords in character class abstract syntax. * regex.h (regex_space_chars): External declaration removed.
*	First cut at implementing \s, \d, \w, \S, \D and \W regex tokens.	Kaz Kylheku	2012-04-19	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* lib.c (init): Call regex_init. * parser.l: return new REGTOKEN kind. * parser.y (REGTOKEN): New token type. (REGTERM): Translate REGTERM to keyword. (regclass): Restructured to handle inherited nodes as lists. (regclassterm): Produce $$ as list. Add handling for REGTOKEN occurring inside character class by expanding it. This might not be the best approach. (yybadtoken): Handle REGTOKEN in switch. * regex.c (struct any_char_set, struct small_char_set, struct displaced_char_set, struct large_char_set, struct xlarge_char_set): New bitfield member, stat. (char_set_create): New parameter for indicating static char set. (char_set_destroy): Do not free a static char set. (char_set_compile): Pass zero to new parameter of char_set_create. (spaces): New static array. (space_cs, digit_cs, word_cs, cspace_cs, cdigit_cs, cword_cs): New static pointers to char_set_t. (init_special_char_sets, nfa_compile_given_set): New static function. (nfa_compile_regex, dv_compile_regex): Handle new character set token keywords. (space_k, digit_k, word_char_k, cspace_k, cdigit_k, cword_char_k, regex_space_chars): New variables. (regex_init): New function. * regex.h (space_k, digit_k, word_char_k, cspace_k, cdigit_k, cword_char_k, regex_space_chars, regex_init): Declared.
*	Bug #35718. Workaround good enough to get some code working.	Kaz Kylheku	2012-03-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	* eval.c (cons_find): New function. (expand_op): Use cons_find rather than tree_find to look for rest_gensym. * regex.c (regsub): Rearranged arguments so that the string is last. This is better for partial evaluaton via the op operator. * regex.h (regsub): Updated declaration.
*	* eval.c (eval_init): New intrinsic function, regsub.	Kaz Kylheku	2012-03-04	1	-0/+1
\| \| \| \| \| \| \| \|	* regex.c (regsub): New function. * regex.h (regsub): Declared. * txr.1: Doc stub added.
*	* arith.c: Updated copyright year.	Kaz Kylheku	2012-02-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* arith.h: Likewise. * debug.c: Added copyright header. * debug.h: Updated copyright year. * eval.c: Likewise. * eval.h: Likewise. * filter.c: Likewise. * filter.h: Likewise. * gc.c: Likewise. * gc.h: Likewise. * hash.c: Likewise. * hash.h: Likewise. * lib.c: Likewise. * lib.h: Likewise. * match.c: Likewise. * match.h: Likewise. * parser.h: Likewise. * regex.c: Likewise. * regex.h: Likewise. * stream.c: Likewise. * stream.h: Likewise. * txr.c: Likewise, and e-mail address. * txr.h: Updated copyright year. * unwind.c: Likewise. * unwind.h: Likewise.
*	We don't include headers in headers in this project.	Kaz Kylheku	2011-10-30	1	-2/+0
\| \| \| \| \| \| \| \|	* parser.h: Do not include <stdio.h> * regex.c: Include <limits.h> * regex.h: Do not include <limits.h>
*	* LICENSE, Makefile, configure, filter.c, filter.h, gc.c, gc.h, hash.c,	Kaz Kylheku	2011-10-04	1	-1/+1
\| \| \| \| \| \|	hash.h, lib.c, lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated e-mail address.
*	* LICENSE, Makefile, configure, gc.c, gc.h, hash.c, hash.h, lib.c,	Kaz Kylheku	2011-09-23	1	-1/+1
\| \| \| \| \| \|	lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated copyright year.
*	Bump copyrights to 2010.	Kaz Kylheku	2010-10-05	1	-1/+1
\|
*	Impelement derivative-based regular expressions.	Kaz Kylheku	2010-01-13	1	-22/+0
\|
*	Code cleanup. All private functions static. Private stuff	Kaz Kylheku	2009-11-28	1	-115/+1
\| \| \| \|	in regex module not exposed in header. Etc.
*	Changes to make the code portable to C++ compilers, which	Kaz Kylheku	2009-11-24	1	-5/+5
\| \| \| \|	can be taken advantage of for better diagnostics.
*	Improving portability. It is no longer assumed that pointers	Kaz Kylheku	2009-11-23	1	-4/+4
\| \| \| \| \| \| \| \|	can be converted to a type long and vice versa. The configure script tries to detect the appropriate type to use. Also, some run-time checking is performed in the streams module to detect which conversions specifier strings to use for printing numbers.
*	Changing ``obj_t *'' occurences to a ``val'' typedef. (Ideally,	Kaz Kylheku	2009-11-20	1	-7/+6
\| \| \| \| \|	we wouldn't have to declare object variables at all, so why use an obtuse syntax to do so?)
*	Fixes for compliance to C89.	Kaz Kylheku	2009-11-17	1	-10/+10
\|
*	Regular expression module updated to do unicode character sets.	Kaz Kylheku	2009-11-12	1	-12/+57
\| \| \| \| \| \| \| \| \| \| \|	Most of the changes are in the area of representing sets. Also, a bug was found in the compilation of regex character sets: ranges straddling two adjacent blocks of 32 characters were not being added to the character set. However, ranges falling within a single 32 block, or spanning three or more such blocks, worked properly. This bug is not tickled by common ranges such as A-Z, or 0-9, which land within a 32 block.
*	Big conversion to wide characters and UTF-8 support.	Kaz Kylheku	2009-11-11	1	-1/+1
\| \| \| \| \| \| \| \| \|	This is incomplete. There are too many dependencies on wide character support from the C stream I/O library, and implicit use of some encoding which may not be UTF-8. The regex code does not handle wide characters properly. Character type is still int in some places, rather than wchar_t. Test suite passes though.
*	Version 019txr-019	Kaz Kylheku	2009-11-03	1	-3/+3
\| \| \| \| \| \|	Regexps can be bound to variables. New freeform directive.
*	Got regex working over lazy strings from freeform.	Kaz Kylheku	2009-11-02	1	-2/+5
\| \| \| \|	Bugfixes.
*	Start of implementation for freestyle matching.	Kaz Kylheku	2009-11-02	1	-4/+17
\| \| \| \| \| \| \| \| \| \| \|	Lazy strings implemented, incompletely. Changed string function to implicitly strdup; non-strdup version changed to string_own. Fixed wrong uses of strdup rather than chk_strdup. Functions added to regex module to provide regex matching as a state machine to which characters are fed.
*	Trivial change allows regexps to be bound to variables,	Kaz Kylheku	2009-10-30	1	-0/+1
\| \| \| \| \|	and used for matching. This Just Works because of the way match_line treats variables.
*	txr-011 2009-09-25txr-011	Kaz Kylheku	2017-07-31	1	-0/+107