| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately, the nfa_state_free function doesn't check the
static flag on a character set and just calls chr_set_destroy.
So when one of the static character sets is planted into the
NFA graph, when that graph is garbage-collected, it blows away
the static character set. Then when that happens twice for the
same set, boom!
We make an alteration to make the destruction more defensive.
Callers of char_set_destroy are no longer saddled with the
responsibility of honoring the static flag buried in the
object. Instead, that function itself check the static flag.
An argument is provided to force the deletion in spite of the
static flag; that is needed for the global cleanup of the
static states. (Only occurs if txr is run with --free-all
and cleanly exited.)
* regex.c (char_set_destroy): Take extra argument, force.
If the set is marked static, then do nothing, unless
force is nonzero.
(char_set_cobj_destroy): Don't check the static flag, just
call char_set_destroy, force zero.
(nfa_state_free): Add force zero argument to char_set_destroy
call. The double free bug is thereby fixed here; static sets
are protected.
(regex_free_all): Force all the char_set_destroy calls here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because the regex printer wrongly uses out_str_char (for the
sake of borrowing its semicolon-notation processing) when
a regex prints, all characters that require escaping in a
string literal get escaped, which includes the " character.
Unfortunately the \" sequence which results is rejected
by the regex parser.
* lib.c (out_str_char): Kludge: add extra argument to
distinguish regex use versus string use, and treat the double
quote accordingly.
(out_str_readable): Give 0 arg to new param of out_str_char.
* lib.h (out_str_char): Declaration updated.
* regex.c (print_class_char, print_rec): Pass 1 to new param
of out_str_char.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* LICENSE, LICENSE-CYG, METALICENSE, Makefile, args.c, args.h,
arith.c, arith.h, buf.c, buf.h, cadr.c, cadr.h, combi.c,
combi.h, configure, debug.c, debug.h, eval.c, eval.h, ffi.c,
ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c,
glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lib.c,
lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c,
parser.h, parser.l, parser.y, protsym.c, rand.c, rand.h,
regex.c, regex.h, share/txr/stdlib/awk.tl,
share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl,
share/txr/stdlib/conv.tl, share/txr/stdlib/doloop.tl,
share/txr/stdlib/error.tl, share/txr/stdlib/except.tl,
share/txr/stdlib/ffi.tl, share/txr/stdlib/getopts.tl,
share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl,
share/txr/stdlib/ifa.tl, share/txr/stdlib/keyparams.tl,
share/txr/stdlib/op.tl, share/txr/stdlib/package.tl,
share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl,
share/txr/stdlib/pmac.tl, share/txr/stdlib/socket.tl,
share/txr/stdlib/stream-wrap.tl, share/txr/stdlib/struct.tl,
share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl,
share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl,
share/txr/stdlib/with-resources.tl,
share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl,
signal.c, signal.h, socket.c, socket.h, stream.c, stream.h,
struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h,
syslog.c, syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h,
unwind.c, unwind.h, utf8.c, utf8.h, win/cleansvg.txr:
Extended Copyright line to 2018.
|
|
|
|
|
|
|
|
| |
* regex.c (nfa_move_closure): The move set calculation
is wrongly assuming that all of the states are new and not
testing their visited color. This could result in the
same state being added twice. Though harmless, it wastefully
inflates the set size.
|
|
|
|
|
|
|
| |
* regex.c (nfa_test_set_visited): New inline function.
(nfa_map_states, nfa_thread_epsilons, nfa_closure,
nfa_move_closure): Use function instead of coding pattern
which tests the state and sets the visited member.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The nfa_accept state label is re-introduced. This state type
has the same representation as nfa_empty; essentially, this
replaces the flag. This makes the state type smaller, and we
don't have to access the flag to tell if a state is an
acceptance state.
* regex.c (nfa_kind_t): New enum label, nfa_accept.
(struct nfa_state_empty): Member accept removed.
(nfa_accept_state_p): Macro tests only for nfa_accept type.
(nfa_empty_state_p): New macro.
(nfa_state_accept): Set type of new state to nfa_accept;
do not set accept flag.
(nfa_state_empty): Do not set accept flag.
(nfa_state_empty_convert): Do not clear accept flag.
(nfa_map_states): Handle nfa_accept in switch, in the
same case as nfa_empty.
(nfa_thread_epsilons): Don't test for accept state in
nfa_empty case; it would be always false now. Add nfa_accept
case to switch which only arranges for a traversal of the two
transitions. (Though these are expected to be null at the
stage of the graph when this function is applied).
(nfa_fold_accept): Switch type to nfa_accept rather than
setting accept flag.
(nfa_closure, nfa_move_closure): Use new macro for testing
whether a state is empty.
|
|
|
|
|
|
|
| |
regex.c (regex_compile): Take the source code to be the
original code, rather than the version with AST-level
optimizations and expansions related to the nongreedy
operator.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was broken by the July 16 commit
"regex: don't print superfluous parens around classes",
2411f779f47c441659720ad0ddcabf91df1d2529.
* regex.c (print_rec): If an (or ...) appears as
a compound element, it must be rendered in parentheses;
or_s must be handled here just like and_s. Prior to the
faulty commit, this was implicitly true because the logic was
inverted and wasn't ruling out or_s.
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this optimization, we identify places in the NFA graph
where empty states transition to accept states.
We eliminate these transitions and turn the empty states
into accept states. Accept states which thereby become
unreachable from the start state are pruned away.
* regex.c (nfa_fold_accept): New static function.
(nfa_optimize): Add a pass which applies nfa_fold_accept
to all states.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Acceptance states are instead represented by adding an accept
flag to nfa_empty states. This will support an optimization
which eliminates accept states that are referenced only by
empty states.
* regex.c (nfa_kind_t): Enum member nfa_accept removed.
(struct nfa_state_accept): Renamed to nfa_state_any.
(struct nfa_state_empty): New member, accept.
(union nfa_state): Member a changes from struct
nfa_state_accept to struct nfa_state_any.
(nfa_accept_state_p): New macro.
(nfa_state_accept): Now makes an nfe_empty type state, with no
transitions out and the accept flag set.
(nfa_state_empty): Initialize accept flag to zero.
(nfa_state_empty_convert): Set the accept flag to zero.
(nfa_state_merge): Use new macro in assertion.
(nfa_map_states): Remove nfa_accept switch case.
(nfa_thread_epsilons): We must not eliminate an empty state which
is an acceptance state, even if it meets the other conditions.
(nfa_closure): Use a local variable to eliminate repetition of
set[i] expression. Test for accept state using new macro.
(nfa_move_closure): Test for accept state using new macro.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is something done for the sake of faster NFA simulation.
I think it's glossed over in regex literature because it
makes no difference in a NFA to DFA transformation.
Fewer states in the graph means less work in the
nfa_move_closure function at regex run time.
In the NFA graph, there can occur empty states which are
useless. These are states which hold only one epsilon
transition. Those states can be replaced by whatever state
their epsilon transition points to. The terminology is
inspired by "jump threading" in code optimization, whereby if
a branch takes place to an instruction which holds an
unconditional branch, the original branch can be retargetted
to go to the ultimate target.
I.e. we turn:
x e
S0 ---> S1 ----> S2
where e denotes an epsilon transition, and x represents
any kind of transition to:
x
S0 ---> S2
We can also eliminate empty states which have no transition;
those can be replaced by a null pointer. I suspect these do
not actually occur.
* regex.c (nfa_thread_epsilons, nfa_noop, nfa_optimize): New
static functions.
(regex_compile): Invoke nfa_optimize on the results of
nfa_compile_regex.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (nfa_run): Start with the existing value of the
visited counter and pre-increment it when calling nfa_closure
and nfa_move_closure. Thus it is incremented only as many
times as those functions are called: one fewer than before.
(regex_machine_reset): regm->n.visited is already incremented,
since 1 was added to the prior value; there is no need to
increment it again.
(nfa_handle_wraparound): More prudent approach here.
If we get within 8 increments of wraparound, we
fast forward to UINT_MAX and then 0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This goes back to November 2, 2009 commit,
"Start of implementation for freestyle matching",
6191fbb2ca7a9ac339dd3994bdea8273ceb0a24d
It is exposed if we perform an epsilon threading
optimization on the NFA graph.
* regex.c (regex_machine_feed): We must pass the new,
incremented value of the visited counter, to nfa_move_closure.
States have already been tagged with the old value before
the call to regex_machine_feed, so we risk failing to
visit some states and include them in the closure.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new function allows a program to determine whether a
given string is the prefix of any of the strings denoted by a
regular expression; or, in alternative words, whether a given
string is the prefix of a possibly longer string which matches
a regular expression.
* regex.c (regex_machine_infer_init_state): New static
function.
(regex_prefix_match): New function.
(regex_init): regex-prefix-match intrinsic registered.
regex.h (regex_prefix_match): Declared.
* txr.1: Documented.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reject states are no longer explicitly represented in the NFA
graph. Any transition that previously would have led to a
reject state is just a null pointer now: no transition.
This just means we have to check one place for a null pointer.
This change fixes a flaw in nfa_closure: the function was
propagating nfa_reject states to the closure set. When
the input set consisted only of reject states (or rather
exactly one reject state in that special case), it would
add it to the output set and return 1, making it seem as
if the regex is one that can potentially match something.
* regex.c (nfa_kind_t): Enum member nfa_reject removed.
(nfa_state_reject): Static function removed.
(nfa_compile_regex): Compile a t regex (match nothing)
to a NFA graph with no transition at all into any start
state; effectively an empty graph with no state nodes.
(nfa_map_states): Remove nfa_reject case from switch. Also,
stylistic change; state pointers are tested as Booleans
elsewhere rather than by comparison to null pointer.
(nfa_count_states, nfa_handle_wraparound): Handle null state
pointer.
(nfa_move_closure): Test the mov transition for null;
anything can potentially now have a null transition,
not only epsilon nodes.
(nfa_run, regex_machine_reset): Handle nfa.start being null
indicating an empty NFA graph. In that case we don't have to
calculate the closure; we just have an empty set of states and
set nfa.nclos to zero. The visited disambiguation counter is
irrelevant since there are no states to visit.
(regex_machine_feed): Don't try to propagate visited counter
stored in regex machine to empty NFA graph which has no states.
|
|
|
|
|
|
|
|
| |
* regex.c (print_rec): Switching from negative tests to
positive: print parens around elements of a compound which
have a lower precedence. The previously incorrectly omitted
set and cset remain unmentioned, so under this inversion of
logic, they print without parentheses.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Renaming cobj_hash_op to cobj_eq_hash_op. This function is
only appropriate to use with COBJ objects which use
eq as their equal funtion. I've spotted one instance of an
inappropriate use which have to be addressed by a different
commit: the equal function is other than eq, but cobj_hash_op
is used for the equal hash.
* lib.h (cobj_hash_op): Declaration renamed to
cobj_eq_hash_op.
* hash.c (cobj_hash_op): Renamed to cobj_eq_hash_op.
(hash_iter_ops): Refer to renamed cobj_hash_eq_op.
* ffi.c (ffi_type_builtin_ops, ffi_type_struct_ops,
ffi_type_ptr_ops, ffi-closure_ops, ffi_call_desc_ops):
Likewise.
* lib.c (cptr_ops): Likewise.
* parser.c (parser_ops): Likewise.
* rand.c (random_state_ops): Likewise.
* regex.c (char_set_ops, regex_obj_ops): Likewise.
* socket.c (dgram_strm_ops): Likewise.
* stream.c (null_ops, stdio_ops, tail_ops, pipe_ops, dir_ops,
string_in_ops, byte_in_ops, strlist_in_ops, string_out_ops,
strlist_out_ops, cat_stream_ops, record_adapter_ops):
Likewise.
* struct.c (struct_type_ops): Likewise.
* sysif.c (cptr_dl_ops): Likewise.
* syslog.c (syslog_strm_ops): Likewise.
* unwind.c (cont_ops): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* lib.h (default_bool_arg): Inline function renamed to
default_null_arg.
* eval.c (if_fun, pad, ginterate, giterate, range_star, range,
constantp, macroexpand_1, macro_form_p, expand_with_free_refs,
do_expand, eval_intrinsic, func_get_name, make_env_intrinsic):
Follow rename.
* arith.c (lognot): Likewise.
* gc.c (gc_finalize): Likewise.
* glob.c (glob_wrap): Likewise.
* hash.c (group_reduce, gethash_n): Likewise.
* lib.c (print, multi_sort, lazy_str, vector, iff, tok_str,
split_str_keep, search_str, remove_if, val): Likewise.
* match.c (match_fun): Likewise.
* parser.c (lisp_parse_impl, regex_parse): Likewise.
* rand.c (make_random_state): Likewise.
* regex.c (read_until_match, search_regex, regex_compile):
Likewise.
* socket.c (sock_accept, sock_connect): Likewise.
* stream.c (open_files_star, open_files, run, open_process,
open_tail, get_string, record_adapter): Likewise.
* struct.c (static_slot_ensure, static_slot_ens_rec,
clear_struct, make_struct_type): Likewise.
* sysif.c (exec_wrap, errno_wrap, cobj_ops_init): Likewise.
* unwind.c (uw_capture_cont, uw_find_frames_impl): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* arith.c (trunc1, trunc, floorf, ceili):
Add missing nao terminator to uw_throwf calls.
* debug.c (debug): Missing nao terminator in format call.
* eval.c (expand_opt_params_rec, me_equot): Missing nao
terminator in eval_error call.
* lib.c (use_sym): Missing nao in uw_throw call.
* regex.c (reg_derivative): Missing nao in uw_throwf.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* LICENSE, LICENSE-CYG, METALICENSE, Makefile, args.c, args.h,
arith.c, arith.h, cadr.c, cadr.h, combi.c, combi.h, configure,
debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, ftw.c,
ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, jmp.S,
lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h,
parser.c, parser.h, parser.l, parser.y, rand.c, rand.h,
regex.c, regex.h, signal.c, signal.h, stream.c, stream.h,
struct.c, struct.h, sysif.c, sysif.h, syslog.c, syslog.h,
termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c, unwind.h,
utf8.c, utf8.h, share/txr/stdlib/awk.tl,
share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl,
share/txr/stdlib/conv.tl, share/txr/stdlib/except.tl,
share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl,
share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl,
share/txr/stdlib/package.tl, share/txr/stdlib/path-test.tl,
share/txr/stdlib/place.tl, share/txr/stdlib/socket.tl,
share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl,
share/txr/stdlib/termios.tl, share/txr/stdlib/txr-case.tl,
share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl,
share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl:
Add 2017 to all copyright headers and strings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we compile the regex expression (compound "str*"), calling
regex-source on the compiled regex object yields "str*".
That, of course, is treated as regex character syntax if
fed back to regex-compile, and the * becomes an operator.
We want the source to be (compound "str*").
This happens because the AST optimizer reduces
(compound X) -> X.
* regex.c (regex_compile): If the optimized expression is just
a character string atom S, then for the purposes of
maintaining the source code, convert it to (compound S).
|
|
|
|
|
|
|
|
| |
* regex.c (regex_range_full_fun, regex_range_left_fun,
regex_range_right_fun, regex_range_search_fun): New functions.
(regex_init): Register fr^$, fr^, fr$ and frr intrinsics.
* txr.1: Documented.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is some infrastructure which will support *print-circle*.
* lib.h (struct strm_ctx): Forward declared.
(struct cobj_ops): Add context parameter to print function
pointer.
(cobj_print_op, obj_print_impl): Add context parameter to
declarations.
* hash.c (hash_print_op): Take context argument and
pass it down in obj_print_impl calls.
* lib.c (cobj_print_op, out_quasi_str): Likewise
(obj_print_impl): Likewise, and also pass to
COBJ print method.
(obj_print, obj_pprint): Pass null pointer
as context argument to obj_print_impl.
* regex.c (regex_print): Take context parameter and ignore it.
* socket.c (dgram_print): Likewise.
* stream.h (struct strm_ctx): New struct type.
(struct strm_base): New ctx member, pointer to struct
strm_ctx.
(stream_print_op): Add context parameter to declaration.
(get_set_ctx, get_ctx): Declared.
* stream.c (strm_base_init): Add null pointer to initializer.
(strm_base_cleanup): Add assertion against context pointer
being non-null: that indicates that some stream operation
installed a context pointer and neglected to restore it to
null before returning, which is bad because context will be
stack allocated.
(stream_print_op, stdio_stream_print, cat_stream_print): Take
context parameter and ignore it.
(get_set_ctx, get_ctx): New functions.
* struct.c (struct_type_print): Take context parameter and
ignore it.
(struct_inst_print): Take context parameter and pass
down to obj_print_impl.
|
|
|
|
|
|
|
|
|
|
|
| |
Since much regex code assumes these are binary, the easiest
and briefest approach is to implement a code transformation
pass which rewrites n-ary forms into binary.
* regex.c (reg_nary_unfold, reg_nary_to_bin): New
functions.
(regex_compile): Put raw sexp through reg_nary_to_bin
to expand the nary syntax.
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (reg_expand_nongreedy, reg_compile_csets):
Generalize the compound_s case slightly by referring
to sym rather than hard-coded compound_s. Then handle
most of the regex operators under this same case.
Their semantics are not relevant to the expansions
being performed in these functions: all their arguments
are regexes to be recursed over.
|
|
|
|
|
|
|
|
|
| |
* regex.c (range_regex_all, regex_range_all): New functions.
(regex_init): Register rra intrinsic function.
* regex.c (range_regex_all, regex_range_all): Declared.
* txr.1: Documented rra.
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (regex_range_search): New function.
(regex_init): Register regex_range_search as rr intrinsic.
* regex.h (regex_range_search): Declared.
* txr.1: Documented rr, and added reference to it
in description of regex-range.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (search_regex): Handle negative starting positions
according to the convention elsewhere and fail excessively
negative ones. Consistently fail on starting positions
exceeding the length of the string. Handle zero length
matches by reporting them against the start position
or position one past the last character, based on the
value of from-end.
* txr.1: search-regex documentation updated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Makefile, args.c, args.h, arith.c, arith.h, cadr.c, cadr.h, combi.c,
combi.h, configure, debug.c, debug.h, eval.c, eval.h, filter.c,
filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h,
jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c,
parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h,
share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl,
share/txr/stdlib/cadr.tl, share/txr/stdlib/conv.tl,
share/txr/stdlib/except.tl, share/txr/stdlib/hash.tl,
share/txr/stdlib/ifa.tl, share/txr/stdlib/path-test.tl,
share/txr/stdlib/place.tl, share/txr/stdlib/socket.tl,
share/txr/stdlib/struct.tl, share/txr/stdlib/termios.tl,
share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl,
share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl,
share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h,
stream.c, stream.h, struct.c, struct.h, sysif.c, sysif.h, syslog.c,
syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c,
unwind.h, utf8.c, utf8.h: Revert to verbatim 2-Clause BSD.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (match_regex): Bail if pos is too positive,
beyond length of string.
(match_regex_right): Include the pos == end case in the
iteration, so we can match an empty suffix of the string. The
inner loop guard takes care of not feeding any characters from
the string into the regex machine in this case; we just feed
the terminating zero to get the final state.
(match_regst): Normalize a negative pos, otherwise the sub_str
calculation will be junk, since match_regex returns a
normalized position. After normalizing, check that if the
position is still negative, the match must fail.
(match_regst_right_old, match_regst_right): Use zero rather
than t as the range end in sub_str. That way if len is
zero and neg(len) produces zero, an empty string will
be sliced out. For negative values, the zero serves
as one position beyond the last char, just like t.
(do_match_full_offs, regex_match_full, regex_range_full,
regex_range_left): Fail match if normalized starting
pos is negative.
(regex_range_right): Fix completely bogus calculation of the
returne range in the case when the end position defaults to
the string length.
|
|
|
|
| |
* regex.c (puts_clear_flag): Fix bad indentation.
|
|
|
|
|
|
|
|
|
| |
* regex.c (regex_source): New function.
(regex_init): regex-source intrinsic registered.
* regex.h (regex_source): Declared.
* txr.1: Documented.
|
|
|
|
|
| |
* regex.c (print_rec): Fix checking arg1
for consp but accessing arg2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (do_match_full, do_match_full_offs, do_match_left,
do_match_left_offs, do_match_right, do_match_right_offs):
New static functions.
(regex_match_full_fun, regex_match_right_fun,
regex_match_full, regex_match_left, regex_match_right,
regex_range_full, regex_range_left, regex_range_right):
New functions.
(regex_init): Register f^$, f^, f$, m^$, m^, m$, r^$,
r^ and r$ intrinsics.
* regex.h (regex_match_full_fun, regex_match_right_fun,
regex_match_full, regex_match_left, regex_match_right,
regex_range_full, regex_range_left, regex_range_right):
Declared.
* txr.1: Documented new functions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The way the end-position argument works in match-regex-right
and match-regst-right is poorly considered. It basically
enforces a constraint that there is a match which ends
at that position and does not go beyond. This patch changes
it work right: the functions test that the regex matches
up to that position, as if the string ended there.
* regex.c (match_regex_right_old): New static function,
identical to the previous match_regex_right.
Since we won't ever be using this inside TXR from any
other module, we don't make it external.
(match_regex_right): Rewritten to new semantics.
(match_regst_right_old): New static function;
provides the semantics of the old match_regst_right
based on match_regex_right_old.
(regex_init): Register match-regex-right and match-regst-right
intrinsics to the match_regex_right_old and
match_regst_right_old functions if compatibility <= 150 is
requested. Otherwise they go to the rewritten new
functions.
* txr.1: Documentation updated, and compat notes
added.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The documentation says that match-regex returns the length.
Actually, it returns the position after the last character
matched. This makes a difference when the match doesn't begin
at character zero.
The actual behavior is that of the match_regex C function
which has behaved that way since the dawn of TXR, and
internals depend on it behaving that way.
So the internal function is being retained, and a new function
is being registered as the match-regex intrinsic. The choice
of binding for match-regex is subject to the compatibility
option.
The behavior of match-regst is also being fixed since its
return value is incorrect due to this issue. Since its
return value makes no sense at all (does not represent
the matched text), it is not subject to the compatibility
option; it is just fixed to conform with the documentation.
* regex.c (match_regex_len): New function.
(match_regst): Keep using match_regex, but use its return
value properly. This simplifies the range extraction code,
which is why match_regex works that way in the first place.
(regex_init): Register match-regex to match_regex_len,
unless compatibility <= 150 is requested; then register
to match_regex.
* regex.h (match_regex_len): Declared.
* txr.1: Compatibility notes added.
|
|
|
|
|
|
|
|
|
| |
* regex.c (regsub): Allow the second argument to be
a function, which is called with str as an argument,
and returns a range which indicates what part of
the string is to be replaced, or else nil.
* txr.1: Documented functional argument of regsub.
|
|
|
|
|
|
|
|
|
| |
* regex.c (match_regex, match_regex_right): Detect
a negative start or end position, respectively,
and add the string length to it. If it is still
negative, bail with nil.
* txr.1: Documented.
|
|
|
|
|
|
|
|
| |
* eval.c (eval_init): Remove all regex-related function
registrations from here.
* regex.c (regex_init): Move regex-related function
registrations here.
|
|
|
|
| |
* regex.c (reg_optimize): Implement ~~R -> R reduction.
|
|
|
|
|
|
|
|
| |
* regex.c (reg_optimize): Based on the reasoning in the
previous commit, we can also statically optimize a
complement whose argument is the t regex: match nothing.
We convert that to match everything: the .* regex.
Now (regex-compile "~[]") -> #/.*/.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The form (match-regex "xy" #/~ab/) should return 2 (full
match) because "xy" is in the complement of the set { "ab" }.
It wrongly returns 1.
* regex.c (reg_derivative): Handle the case when
the derivative of the complement's constituent expression
yields nil. This means that the complemented regex matches
the input. In this case, the complement must lapse to the .+
regex: match one or more characters. That is to say, if the
input has at least one more character, there is a match, which
covers all such characters. Otherwise there is no match: the
input matches the complemented regex. In the t case, the
return value is also wrong. If the complemented regex hits
a brick wall (matches nothing, not even the empty string),
the correct complement is "match everything": the .* regex.
Not the match empty string regex!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't have to flip between two arrays, since the
nfa_closure and and nfa_move_closure can write the
output set into the same array.
* regex.c (struct nfa_machine): Replace flip and flop
members with a single set.
(nfa_closure, nfa_move_closure): out array parameter removed;
in renamed to set. References to in and out simply replaced
with set.
(nfa_run): Allocate one set instead of two, plus the stack.
Remove code to swap the two pointers on each iteration.
(regex_machine_reset): Prepare initial closure in the one
and only set array.
(regex_machine_init): Allocate set array, rather than flip an
flop.
(regex_machine_cleanup): Free set array and null out pointer
rather than flip and flop arrays.
(regex_machine_feed): Pass just the set ot the
nfa_move_closure function. Remove flip/flop pointer swapping
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (struct nfa_machine_t): Remove move and clos
array pointers, replace with flip and flop. Remove
nmove member.
(nfa_move): Static function removed.
(nfa_move_closure): New static function, based on nfa_move and
logic from nfa_closure.
(nfa_run): Use nfa_move_closure and flip between two
arrays.
(regex_machine_reset): Remove reference to nmove member
in nfa_machine_t. Prepare initial closure in flip array.
(regex_machine_init): Allocate flip and flop arrays,
rather than removed move and clos.
(regex_machine_cleanup): Free flip and flop arrays and
zero out the pointers, rather than removed move and clos.
(regex_machine_feed): Replace nfa_move and nfa_closure
with combined nfa_move_closure from flip to flop,
and exchange of flip and flop arrays.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although we are garbage-collected, being able to clean up on
shutdown is nevertheless useful for uncovering leaks. Leaks
can occur, for instance, due to neglect to free out-of-heap
satellite data from objects that are reclaimed by gc.
This feature is long overdue.
* arith.c, arith.h (arith_free_all): New function.
* gc.c, gc.h (gc_free_all): New function.
* lib.c (init): Remove program name parameter and
redundant initialization of progname globl variable.
* lib.h (progname): Superfluous declaration removed.
This is already declared in txr.h.
(init): Declaration updated.
* regex.c (char_set_destroy): Do not check the static
allocation flag here; just destroy the object.
Do check for a null pointer, though.
(char_set_cobj_destroy): This cobj destructor now
checks the static flag of the char set object and
avoids freeing it. Thus our char set singletons are
left alone by gc, but our global freeing function
takes care of them.
(wide_cs): New static variable moved out of
wide_display_char_p to static scope.
(regex_free_all): New function.
* regex.h (regex_free_all): Declared.
* txr.c (progname): const qualifier and initializer removed.
(main): Ensure progname is always dynamically allocated, even
in the argv[0] == 0 case. Do not pass progname to init;
it doesn't take that argument any more.
(free_all): New static function.
(txr_main): Implement --free-all option.
* txr.h (progname): Declaration updated.
|
|
|
|
|
|
|
|
|
|
|
| |
* lib.c (rcyc_pop): Just assume that *plist points to a cons
and access the fields directly.
(rcyc_cons): Don't bother with rplacd.
(rcyc_list): Don't bother with set macro.
* regex.c (read_until_match): Defensive coding: locally
ensure that rcyc_pop won't be called on a nil stack,
which will now segfault.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (ead_until_match): Use rcyc_pop instead of pop
to move the conses to the recycle list. We know these
are not shared with anything. Adding additional logic
to completely recycle the stack.
* socket.c (dgram_get_char): Use rcyc_pop to
get the character from the push-back list.
* stream.c (stdio_get_char): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* regex.c (read_until_match): New argument, include_match.
Three times repeated termination code refactored into block
reached by forward goto.
(regex_init): Registration of read-until-match updated.
* regex.h (read_until_match): Declaration updated.
* stream.c (struct record_adapter_base): New member,
include_match.
(record_adapter_get_line): Pass match to read_until_match
as new argument.
(record_adapater): New argument, include_match.
(stream_init): Update registration of record-adapter.
* stream.h (record_adapter): Declaration updated.
* txr.1: Updated.
|
|
|
|
|
| |
* regex.c (read_until_match): Completely rewrite broken,
unsalvageable, garbage logic.
|
|
|
|
|
|
|
| |
* arith.c, cadr.c, debug.c, eval.c, filter.c, gencadr.txr, glob.c,
hash.c, linenoise/linenoise.c, lisplib.c, match.c, parser.c, rand.c,
regex.c, signal.c, stream.c, struct.c, sysif.c, syslog.c, txr.c,
unwind.c, utf8.c: Remove unncessary header files.
|