diff options
author | Kaz Kylheku <kaz@kylheku.com> | 2017-09-12 06:54:20 -0700 |
---|---|---|
committer | Kaz Kylheku <kaz@kylheku.com> | 2017-09-12 06:54:20 -0700 |
commit | 7345dcd449cfdcf7d4d5ab438ea755f5252f50a1 (patch) | |
tree | c973ec0e374c5c2e352f44edd0c1ad7d1990f5d8 /ftw.c | |
parent | a4130c6db053fb2f02138ec64652d2a321ef497f (diff) | |
download | txr-7345dcd449cfdcf7d4d5ab438ea755f5252f50a1.tar.gz txr-7345dcd449cfdcf7d4d5ab438ea755f5252f50a1.tar.bz2 txr-7345dcd449cfdcf7d4d5ab438ea755f5252f50a1.zip |
regex: epsilon-threading optimization on NFA graph.
This is something done for the sake of faster NFA simulation.
I think it's glossed over in regex literature because it
makes no difference in a NFA to DFA transformation.
Fewer states in the graph means less work in the
nfa_move_closure function at regex run time.
In the NFA graph, there can occur empty states which are
useless. These are states which hold only one epsilon
transition. Those states can be replaced by whatever state
their epsilon transition points to. The terminology is
inspired by "jump threading" in code optimization, whereby if
a branch takes place to an instruction which holds an
unconditional branch, the original branch can be retargetted
to go to the ultimate target.
I.e. we turn:
x e
S0 ---> S1 ----> S2
where e denotes an epsilon transition, and x represents
any kind of transition to:
x
S0 ---> S2
We can also eliminate empty states which have no transition;
those can be replaced by a null pointer. I suspect these do
not actually occur.
* regex.c (nfa_thread_epsilons, nfa_noop, nfa_optimize): New
static functions.
(regex_compile): Invoke nfa_optimize on the results of
nfa_compile_regex.
Diffstat (limited to 'ftw.c')
0 files changed, 0 insertions, 0 deletions