| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
* regex.c (match_regex, match_regex_right): Detect
a negative start or end position, respectively,
and add the string length to it. If it is still
negative, bail with nil.
* txr.1: Documented.
|
|
|
|
|
|
|
|
| |
* eval.c (eval_init): Remove all regex-related function
registrations from here.
* regex.c (regex_init): Move regex-related function
registrations here.
|
|
|
|
| |
* regex.c (reg_optimize): Implement ~~R -> R reduction.
|
|
|
|
|
|
|
|
| |
* regex.c (reg_optimize): Based on the reasoning in the
previous commit, we can also statically optimize a
complement whose argument is the t regex: match nothing.
We convert that to match everything: the .* regex.
Now (regex-compile "~[]") -> #/.*/.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The form (match-regex "xy" #/~ab/) should return 2 (full
match) because "xy" is in the complement of the set { "ab" }.
It wrongly returns 1.
* regex.c (reg_derivative): Handle the case when
the derivative of the complement's constituent expression
yields nil. This means that the complemented regex matches
the input. In this case, the complement must lapse to the .+
regex: match one or more characters. That is to say, if the
input has at least one more character, there is a match, which
covers all such characters. Otherwise there is no match: the
input matches the complemented regex. In the t case, the
return value is also wrong. If the complemented regex hits
a brick wall (matches nothing, not even the empty string),
the correct complement is "match everything": the .* regex.
Not the match empty string regex!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't have to flip between two arrays, since the
nfa_closure and and nfa_move_closure can write the
output set into the same array.
* regex.c (struct nfa_machine): Replace flip and flop
members with a single set.
(nfa_closure, nfa_move_closure): out array parameter removed;
in renamed to set. References to in and out simply replaced
with set.
(nfa_run): Allocate one set instead of two, plus the stack.
Remove code to swap the two pointers on each iteration.
(regex_machine_reset): Prepare initial closure in the one
and only set array.
(regex_machine_init): Allocate set array, rather than flip an
flop.
(regex_machine_cleanup): Free set array and null out pointer
rather than flip and flop arrays.
(regex_machine_feed): Pass just the set ot the
nfa_move_closure function. Remove flip/flop pointer swapping
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (struct nfa_machine_t): Remove move and clos
array pointers, replace with flip and flop. Remove
nmove member.
(nfa_move): Static function removed.
(nfa_move_closure): New static function, based on nfa_move and
logic from nfa_closure.
(nfa_run): Use nfa_move_closure and flip between two
arrays.
(regex_machine_reset): Remove reference to nmove member
in nfa_machine_t. Prepare initial closure in flip array.
(regex_machine_init): Allocate flip and flop arrays,
rather than removed move and clos.
(regex_machine_cleanup): Free flip and flop arrays and
zero out the pointers, rather than removed move and clos.
(regex_machine_feed): Replace nfa_move and nfa_closure
with combined nfa_move_closure from flip to flop,
and exchange of flip and flop arrays.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although we are garbage-collected, being able to clean up on
shutdown is nevertheless useful for uncovering leaks. Leaks
can occur, for instance, due to neglect to free out-of-heap
satellite data from objects that are reclaimed by gc.
This feature is long overdue.
* arith.c, arith.h (arith_free_all): New function.
* gc.c, gc.h (gc_free_all): New function.
* lib.c (init): Remove program name parameter and
redundant initialization of progname globl variable.
* lib.h (progname): Superfluous declaration removed.
This is already declared in txr.h.
(init): Declaration updated.
* regex.c (char_set_destroy): Do not check the static
allocation flag here; just destroy the object.
Do check for a null pointer, though.
(char_set_cobj_destroy): This cobj destructor now
checks the static flag of the char set object and
avoids freeing it. Thus our char set singletons are
left alone by gc, but our global freeing function
takes care of them.
(wide_cs): New static variable moved out of
wide_display_char_p to static scope.
(regex_free_all): New function.
* regex.h (regex_free_all): Declared.
* txr.c (progname): const qualifier and initializer removed.
(main): Ensure progname is always dynamically allocated, even
in the argv[0] == 0 case. Do not pass progname to init;
it doesn't take that argument any more.
(free_all): New static function.
(txr_main): Implement --free-all option.
* txr.h (progname): Declaration updated.
|
|
|
|
|
|
|
|
|
|
|
| |
* lib.c (rcyc_pop): Just assume that *plist points to a cons
and access the fields directly.
(rcyc_cons): Don't bother with rplacd.
(rcyc_list): Don't bother with set macro.
* regex.c (read_until_match): Defensive coding: locally
ensure that rcyc_pop won't be called on a nil stack,
which will now segfault.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (ead_until_match): Use rcyc_pop instead of pop
to move the conses to the recycle list. We know these
are not shared with anything. Adding additional logic
to completely recycle the stack.
* socket.c (dgram_get_char): Use rcyc_pop to
get the character from the push-back list.
* stream.c (stdio_get_char): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (read_until_match): New argument, include_match.
Three times repeated termination code refactored into block
reached by forward goto.
(regex_init): Registration of read-until-match updated.
* regex.h (read_until_match): Declaration updated.
* stream.c (struct record_adapter_base): New member,
include_match.
(record_adapter_get_line): Pass match to read_until_match
as new argument.
(record_adapater): New argument, include_match.
(stream_init): Update registration of record-adapter.
* stream.h (record_adapter): Declaration updated.
* txr.1: Updated.
|
|
|
|
|
| |
* regex.c (read_until_match): Completely rewrite broken,
unsalvageable, garbage logic.
|
|
|
|
|
|
|
| |
* arith.c, cadr.c, debug.c, eval.c, filter.c, gencadr.txr, glob.c,
hash.c, linenoise/linenoise.c, lisplib.c, match.c, parser.c, rand.c,
regex.c, signal.c, stream.c, struct.c, sysif.c, syslog.c, txr.c,
unwind.c, utf8.c: Remove unncessary header files.
|
|
|
|
|
| |
* regex.c (print_rec): Handle '[' and ']' in backslash-adding
switch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* lib.c (out_str_char): Static function becomes extern.
* lib.h (out_str_char): Declared.
* regex.c (puts_clear_flag, putc_clear_flag): New static
functions.
(print_class_char): Take semicolon flag argument.
Use out_str_char to render characters not escaped locally.
Clear the semicolon flag.
(paren_print_rec): Take semicolon flag argument, and pass it
down. Clear it when printing parentheses.
(print_rec): Take semicolon flag argument, and pass
down to lower level functions. Use putc_clear_flag and
puts_clear_flag instead of put_string and put_char.
Use out_str_char for char object not esaped locally.
(regex_print): define semi_flag and pass it down
to print_rec.
|
|
|
|
| |
* regex.c (print_class_char): Add missing character cases.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (read_until_match): New function.
(regex_init): Registered read-until-match intrinsic.
* regex.h (read_until_match): Declared.
* stream.c (struct delegate_base): New struct type.
(delegate_base_mark, delegate_put_string, delegate_put_char,
delegate_put_byte, delegate_get_char, delegate_get_byte,
delegate_unget_char, delegate_unget_byte, delegate_close,
delegate_flush, delegate_seek, delegate_truncate,
delegate_get_prop, delegate_set_prop, delegate_get_error,
delegate_get_error_str, delegate_clear_error,
make_delegate_stream): New static functions.
(struct record_adapter_base): New struct type.
(record_adapter_base_mark, record_adapter_mark_op,
record_adapter_get_line): New static functions.
(record_adapter_ops): New static structure.
(record_adapter): New function.
(stream_init): Registered record-adapter intrinsic.
* stream.h (record_adapter): Declared.
* txr.1: Documented read-until-match and record-adapter.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* LICENSE, METALICENSE, Makefile, args.c, args.h, arith.c,
arith.h, cadr.c, cadr.h, combi.c, combi.h, configure,
debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, gc.c,
gc.h, glob.c, glob.h, hash.c, hash.h, jmp.S, lib.c, lib.h,
lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h,
parser.l, parser.y, rand.c, rand.h, regex.c, regex.h,
share/txr/stdlib/cadr.tl, share/txr/stdlib/except.tl,
share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl,
share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl,
share/txr/stdlib/struct.tl, share/txr/stdlib/txr-case.tl,
share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl,
share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl,
signal.c, signal.h, stream.c, stream.h, struct.c, struct.h,
sysif.c, sysif.h, syslog.c, syslog.h, txr.1, txr.c, txr.h,
unwind.c, unwind.h, utf8.c, utf8.h: Add 2016 copyright.
* linenoise/LICENSE, linenoise/linenoise.c,
linenoise/linenoise.h: Bump one principal author's copyright
from 2014 to 2015. The code is based on a snapshot of 2015
upstream work.
|
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (range_regex): Return range.
(search_regst): Use appropriate accessors on
range returned by range_regex.
* lib.c (tok_where): Destructure range returned by
range_regex, using range_bind.
* txr.1: Documented changed behavior.
|
|
|
|
|
|
|
|
| |
* regex.c (search_regex): In the Sep 7 2015 commit
titled "Don't use prot1 for temporary gc protection",
a rel1 call was left behind, causing an assert whenever
the function is used for a succesful "from end"
search.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TXR is moving to custom assembly-language routines.
This is mainly motivated by a very dubious thing done in the
GNU C Library setjmp and longjmp in the name of security.
Evidently, glibc's setjmp "mangles" certain pointer values
which are stored into the jmp_buf buffer. It's been that way
since 2005, evidently. This means that, firstly, all along,
the use of setjmp in gc.c to get registers into a buffer so
they can be scanned has not actually worked properly. More
importantly, this pointer mangling in setjmp and longjmp is
very hostile to a stack copying implementation of delimited
continuations. The reason is that continuations contain
jmp_buf buffers, which get relocated in the process of
capturing and reviving a continuation. Any pointers in a
jmp_buf which point into the captured stack segment have to be
fixed up to point into the relocated location. Mangled
pointers make this difficult, requiring hacks which are
specific to glibc and the machine architecture. We might as
well implement a clean, well-behaved setjmp and longjmp.
* Makefile (jmp.o): New object file.
(dbg/%.o, opt/%.o): New rules for .S prerequisites.
* args.c, arith.c, cadr.c, combi.c, cadr.c, combi.c, debug.c,
eval.c, filter.c, glob.c, hash.c, lib.c, match.c, parser.c,
rand.c, regex.c, signal.c, stream.c, struct.c, sysif.c,
syslog.c, txr.c, unwind.c, utf8.c: Removed <setjmp.h>
include.
* gc.c: Switch to struct jmp and jmp_save, instead
of jmp_buf and setjmp.
* jmp.S: New source file.
* signal.h (struct jmp): New struct type.
(jmp_save, jmp_restore): New function declarations
denoting assembly language routines in jmp.S.
(extended_jmp_buf): Uses struct jmp instead of
setjmp.
(extended_setjmp): Use jmp_save instead of setjmp.
(extended_longjmp): Use jmp_restore instead of
longjmp.
|
|
|
|
|
|
|
| |
* regex.c (reg_optimize): If the empty regex is and-ed with
another regex, that other regex must be nullable, otherwise
the and matches nothing. This is captured in some new
reductions for the and operator.
|
|
|
|
|
|
| |
* regex.c (reg_optimize): No need to check reg_matches_all in
and optimization case because the argument object has already
been reduced that way by reg_optimize recursion.
|
|
|
|
|
|
| |
* regex.c (reg_compl_char_p): New static function.
(reg_optimize): Optimize various cases of the
or operator: (R|) -> R?, (a|b) -> [ab] and others.
|
|
|
|
|
|
| |
* regex.c (regex_optimize): Simplify compounded
uses of repetition operators: RR* -> R, R+? -> R*
and so on.
|
|
|
|
|
| |
regex.c (print_rec): Bugfix: handle symbols in character
class syntax.
|
|
|
|
|
| |
* regex.c (reg_optimize): Transform ~.*c to (.*[^c])?
and ~c.* to ([^c].*)? where c is a single-character match.
|
|
|
|
|
|
|
| |
* regex.c (reg_single_char_p, invert_single): New static
functions.
(reg_optimize): Simplify complement operator optimizations
using new functions.
|
|
|
|
|
| |
* regex.c (reg_optimize): [a] -> a. Also take advantage
of this where the complement case generates [a].
|
|
|
|
|
| |
* regex.c (reg_optimize): Recognize and transform several
cases: ~c -> ([^c]?|..+); ~[^c] -> ([c]?|..+); and ~.*c.* -> [^c]*.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (dv_compile_regex): Replaced by two functions,
reg_expand_nongreedy and reg_compile_csets.
(reg_expand_nongreedy, reg_compile_csets): New static
functions.
(reg_optimize): New static function.
(regex_compile): Expand nongreedy syntax in incoming regex,
and then optimize it before deciding whether to use NFA or
derivatives. If derivatives are used, compile the
character sets in the regex to character set objects.
(regex_init): Register some intrinsic functions for debugging,
sys:reg-expand-nongreedy and sys:reg-optimize.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The t regex means "match nothing". This patch allows the NFA
compiler to handle it. This will be necessary for an upcoming
regex optimizer which can put out such an object. Also, the
recursive regex printer can print the object now.
* regex.c (nfa_kind_t): New enum member, nfa_reject.
(nfa_state_reject): New static function.
(nfa_compile_regex): Compile t regex into a reject
state which cannot reach its corresponding acceptance
state.
(nfa_map_states): Handle nfa_reject case in switch, similarly
to nfa_accept: nothing to transition into.
(print_rec): Render the t regex as the empty character class [].
|
|
|
|
|
|
| |
* regex.c (nfa_compile_regex, dv_compile_regex, reg_nullable,
reg_matches_all, reg_derivative, regex_requires_dv): Throw an
exception for the bad operator case.
|
|
|
|
|
|
| |
* regex.c (reg_matches_all): A complement matches all if
its argument matches nothing, not if its argument
is anything but the empty match nil.
|
|
|
|
|
|
|
|
|
|
|
| |
This change a huge improvement for expressions that use
complement, directly or via the non-greedy % operator.
* regex.c (reg_matches_all): New static function.
(reg_derivative): When the dervative is applied
to a complement expression, identify situations when
the remaining expression cannot possibly match
anything, and convert them to the t expression.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a NFA regex goes through more than 4.29 billion state
transitions, the state coloring "visited" marker wraps around.
There could still exist states with old values at or near
zero, which destroys the correctness of the closure
calculations.
* regex.c (nfa_handle_wraparound): New static function.
The wraparound situation is handled by detecting when
the next marker value is UINT_MAX. When this happens,
we visit all states, marking them to UINT_MAX.
Then we visit them again, marking them to zero, and
set the next marker value to 1.
(nfa_free): Added comment about why we don't have a
wraparound check, in case it isn't obvious.
(nfa_run): Check for wraparound before eveyr nfa_closure call.
(regex_machine_reset): Check for wraparound before nfa_closure
call. Fix: store the counter back in the start state's visited
field.
(regex_machine_init): Initialize the n.visited field of the
regex machine structure to zero. Not strictly necessary, since
it's initialized moments later in regex_machine_reset, but
good form.
(regex_machine_feed): Check for wraparound before nfa_closure
call.
|
|
|
|
|
| |
* regex.c (nfa_free): Use alloca for array of all states.
(nfa_run): Use alloca for move, closure and stack arrays.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (struct regex): New member, nstates.
(NFA_SET_SIZE): Preprocessor symbol removed.
(struct nfa_machine): New member, nstates.
(nfa_all_states): Function removed.
(nfa_map_states): New static function.
(nfa_count_one, nfa_count_states, nfa_collect_one): New static
functions.
(nfa_free): Takes nstates argument. Calculate array of all
states using nfa_map_states over nfa_collect_one rather than
nfa_all_states. The array is tightly allocated. Also the
spanning tree traversal needs just one root, nfa.start.
It's not clear why nfa_all_states used nfa.start and
nfa.accept as roots.
(nfa_closure): Takes nstates parameter; array bounds checking
performed tightly against nstates rather than NFA_SET_SIZE.
(nfa_move): Check against NFA_SET_SIZE removed.
(nfa_run): Take nstates argument. Allocate arrays tightly. Pass nstates
to nfa_closure.
(regex_destroy): Pass regex->nstates to nfa_free.
(regex_compile): Initialize regex->nstates.
(regex_run): Pass regex->nstates to nfa_run.
(regex_machine_reset): Pass nstates to nfa_closure.
(regex_machine_init): Initialize n.nstates member of regex
machine. Allocate arrays tightly.
(regex_machine_feed): Pass nstates to nfa_closure.
|
|
|
|
|
|
| |
* regex.c (nfa_free): The visited marker must be incremented,
otherwise nfa_all_states will only collect start and
accept.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* lib.c (split_str, split_str_set, list_str, int_str): Use
gc_hint rather than prot1/rel1. More efficient, doesn't
use space in the prot_stack array.
* regex.c (search_regex): Likewise.
* stream.c (vformat_str, formatv, run): Likewise.
In formatv, rel1 wasn't being called in the uw_unwind
block, so this fixes a bug.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (create_wide_cs): New static function.
(wide_display_char_p): New function.
* regex.h (wide_display_char_p): Declared.
* stream.c (put_string, put_char): Use wide_display_char_p
to determine whether an extra column need be counted. Also bugfix:
iswprint evidently cannot be relied to work over the entire Unicode
range, at least not in the C locale. Glibc's version and is reporting
valid Japanese characters as unprintable on Ubuntu. As a hack we
instead check for control characters and invert the result: control
chars are unprintable.
* tests/009/json.expected: Updated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* hash.c (hash_print_op): Take third argument,
and call cobj_print_impl rather than cobj_print.
* lib.c (cobj_print_op): Take third argument. The object class is
* printed with obj_print_impl.
(obj_print_impl): Static function becomes extern. Passes its pretty
flag argument to cobj print virtual function.
* lib.h (cobj_ops): print takes third argument.
(cobj_print_op): Declaration updated.
(obj_print_impl): Declared.
* regex.c (regex_print): Takes third argument, and ignores it.
* stream.c (stream_print_op, stdio_stream_print, cat_stream_print):
Take third argument, and ignore it.
* stream.h (stream_print_op): Declaration updated.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In fact, the previosuly documented process is not correct and still
leaves a corruption problem under generational GC (which has been the
default for some time).
* HACKING: Document flaw in the initialization pattern previously
thought to be correct, and show fix.
* hash.c (copy_hash): Fix instance of incorrect pattern.
* regex.c (regex_compile): Likewise.
|
|
|
|
|
|
| |
* regex.c (print_rec): Only dianose "bad object in regex syntax"
for some atom other than nil, which denotes an empty (sub)expression,
like what results from #// or #/a|/ and such.
|
|
|
|
|
| |
should return zero length, rather than nil. This is achieved by trying
the match at one past the last character.
|
|
|
|
|
|
|
|
|
|
|
| |
* eval.c (eval_init): Register search-regst, match-regst
and match-regst-right intrinsics.
* regex.c (search_regst, match_regst, match_regst_right): New functions.
* regex.h (search_regst, match_regst, match_regst_right): Declared.
* txr.1: Documented new variants.
|
|
|
|
| |
elements which have a higher precedence than catenation.
|
|
|
|
|
|
|
|
|
|
|
| |
* arith.c, arith.h, combi.c, combi.h, debug.c, debug.h, eval.c, eval.h,
filter.c, filter.h, gc.c, gc.h, hash.c, hash.h, lib.c, lib.h,
match.c, match.h, parser.h, rand.c, rand.h, regex.c, regex.h,
signal.c, signal.h, stream.c, stream.h, sysif.c, sysif.h, syslog.c,
syslog.h, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h:
Update.
* LICENSE, METALICENSE: Likewise.
|
|
|
|
|
|
|
|
|
|
| |
* lib.h (cobj_ops_init): New macro.
* hash.c (hash_ops, hash_iter_ops): Initialize with cobj_ops_init.
* rand.c (random_state_ops): Likewise.
* regex.c (char_set_obj_ops, regex_obj_ops): Likewise.
|
|
|
|
|
|
|
|
|
|
| |
(GREP_CHECK): New macro.
(enforce): Rewritten using GREP_CHECK, with new checks.
* arith.c, combi.c, debug.c, eval.c, filter.c, gc.c, hash.c, lib.c,
* lib.h, match.c, parser.l, parser.y, rand.c, regex.c, signal.c,
* signal.h, stream.c, syslog.c, txr.c, unwind.c, utf8.c: Remove
trailing spaces.
|