| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks merge-jump-thunks): For each
group of candidate jump-blocks, search the entire basic block
list for one more jump block which is identical to the
others, except that it doesn't end in a jmp, but rather
falls through to the same target that the group jumps to.
That block is then included in the group, and also becomes the
default leader since it is pushed to the front.
(basic-blocks late-peephole): Remove the peephole pattern
which tried to attack the same problem. The new approach is
much more effective: when compiling stdlib, 77 instances occur in which
such a block is identified and added! The peephole pattern only
matched six times.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are six hits for this in stdlib, two of them in
optimize.tl itself. The situation is like:
label1
(instruction ...)
(jmp label3)
label2
(instruction ...)
label3
where (instruction ...) is identical in both places.
label1 and label2 are functionally identical blocks, which
means that the pattern can be rewritten as:
label1
label2
(instruction ...)
label3
When the label1 path is taken it's faster due to the
elimination of the jmp, and code size is reduced by two
instructions.
This pattern may possibly the result of an imperfection in the
design of the basic-blocks method merge-jump-thunks.
The label1 and label2 blocks are functionally identical.
But merge-jump-thunks looks strictly for blocks that end in a
jmp instruction. It's possible that there was a jmp
instruction and the end of the label2 block, which got
eliminated before merge-jump-thunks, which is done late, just
before late-peephole.
* stdlib/optimize.tl (basic-blocks late-peephole): New rule
for the above pattern.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've noticed a wasteful instruction pattern in the compiled
code for the sys:awk-code-move-check function:
7: 2C020007 movsr t2 t7
8: 3800000E if t7 14
9: 00000007
10: 20050002 gcall t2 1 t9 d1 t8 t6 t7
11: 00090001
12: 00080401
13: 00070006
14: 10000002 end t2
Here, the t2 register can be replaced with t7 in the gcall
and end instructions, and the movsr t2 t7 instruction can
be eliminated.
It looks like something that could somehow be targeted more generally
with a clever peephole pattern assisted by data-flow information,
but for now I'm sticking in a dumb late-peephole pattern which just
looks for this very specific pattern.
* stdlib/optimize.tl (basic-blocks late-peephole): Add new
pattern for eliminating the move, as described above.
There are several hits for this in the standard library in addition to
the awk module: in the path-test, each-prod and getopts files.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* stdlib/compiler.tl (compiler optimize): After the dataflow-driven
peephole optimization, call elim-dead-code again.
* stdlib/optimize.tl (basic-blocks check-bypass-empty): New method.
(basic-bocks elim-dead-code): After eliminating unreachable blocks
from the list, we use check-bypass-empty to squeeze out any
empty blocks: blocks that have no instructions in their list,
other than the leading label. This helps elim-next-jmp
to find more opportunities to eliminate a wasteful jump, because
sometimes these jumps straddle over empty blocks.
Furthermore, elim-next-jmp can generate more empty blocks itself;
so we check for this situation, delete the blocks and iterate.
|
|
|
|
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks elim-dead-code): When
clearing the links before recalculating the graph, also clear
the next field of every block, because link-graph only sets
this if necessary, assuming that the value is already nil.
Thus by not resetting it, we risk leaving stale values in
these .next fields. The code reachability calculation relies
on next fields, so if they falsely point to dead blocks, those
blocks could be falsely retained.
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-block peephole-block): In a few more
cases, we should be setting the recalc flag to recalculate liveness,
and adding some block to the rescan list.
|
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks peephole-block): Rearrange the code a
bit so we don't calculate the xbl, which potentially performs the
cut-block, if there is no ybl. We set the bb.recalc flag since we may
have cut a block into two and have redirected a jump, and also
update the links for that reason.
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks thread-jumps-block,
basic-blocks peephole-block): Streamline various cases of [bb.hash
jlabel] being wastefully called twice to look up the same block
referenced by the same label.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The next-block method performs a linear search through the basic
block list, which is physically ordered, to find the physically
next block. This is actually not needed in several places that use the
method; they want the logically next block, which is nil if the last
instruction of the current doesn't potentially fall through to the next
block. In the one place where we need the physical next block, in the
elim-next-jump method, the caller can dynamically provide this, since it
walks the list.
* stdlib/optimize.tl (basic-block next-block): Method removed.
(basic-block link-graph): We revise the logic here a little bit. All of
the cases now consistently use the mechanism of setting link-next to
nil to indicate that they don't fall through to the next block.
The special case handling of the close instruction is clearer.
(basic-block (thread-jumps-block, peephole-block)): Several cases here
referred to the physically next block via the next-block method. This
can be replaced by just using the next pointer, which will be the same.
(basic-blocks elim-next-jump): This method now takes the next block as
an argument, since there is no next-block method it can call to get
the physcally next block. The argument is guaranteed non-null, so we
don't need the .? null-safe slot access syntax.
(basic-blocks elim-dead-code): Iterate over the next blocks
simultaneously, and pass the next block into elim-next-jump.
We no longer iterate over the last block, which has no physical next
block.
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks join-block): Merge set
forms into one.
(basic-blocks elim-dead-code): Likewise.
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks link-graph): Do not search
the entire list for a block's successor. Iterate over the cdr
of the list in parallel, so that the next block is directly
available at each iteration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks thread-jumps-block): There
can't be any instructions in a basic block after an if or ifq,
so in these cases, jrest is always nil. Let's ignore that nil
efficiently with @nil, and get rid of the cut-block branches
of the code. There is a similar case in peephole-block, but
the target of the jump is an (end ...) which doesn't
necessarily end a basic block. I temporarily put in an (assert
(null jrest)), and, surprisingly, it never went off during a
rebuild of the library or running of the test case. Still,
only a jend ends a basic block; it would not be correct to
simplify it like these two cases in thread-jumps-block.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a jmp instruction is removed from (necessarily) the end of a basic
block, that basic block can be merged with the next one, and marked for
re-scanning.
A test case where this eliminates wasteful register-register move
instruction is (match #(@a) #(3) a).
* stdlib/optimize.tl (basic-blocks): New slot, tryjoin.
(basic-blocks join-block): Null out the instruction list of the joined
block. This helps if we do this during peephole processing, because it
happens in the middle of an iteration over a list of blocks which can
still visit the next block that has been merged into its predecesor; we
don't want to be processing instructions that are no longer relevant.
(basic-blocks peephole-block): In the one case where a conditional
instruction is deleted from the end of the basic block, we add the block
to the rescan list, and also to the tryjoin list. If the block can
be merged with the next one, that can create more opportunities for
peephole optimization.
(basic-blocks peephole): Use zap in a few places to condense the logic
of sampling a state variable that needs to be nulled out. Add the
processing of the tryjoin list: pop basic blocks from the list, and try
to merge them with their successor, if possible. We handle cases here
where the next block could itself be in tryjoin. Also, if we join any
blocks, we set the recalc flag to recalculate the liveness info.
|
|
|
|
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks peephole-block): When we match a
branching instruction, including jend, we know that's the end of the
basic block. So there is no need to splice the (rest insns) into
the output; let's get rid of that. On the other hand, there is also no
need to have a specific pattern match for the end of the list such
as ((jmp @label)). This costs extra cycles to validate. Let's
consistently match these basic-block terminating instructions using
prefix patterns like ((jmp @label) . @nil)).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* stdlib/defset.tl (set-mask, clear-mask): New update macros.
* stdlib/optimize.tl (calc-liveness): Use the new macros.
* stdlib/socket.tl (sys:str-inaddr-net-impl, str-in6addr-net):
Same.
* stdlib/termios.tl (set-iflags, set-oflags, set-cflags,
set-lflags, clear-iflags, clear-oflags, clear-cflags,
clear-lflags): Same.
* lisplib.c (defset_set_entries): Add set-mask and clear-mask
to autoload symbols for defset.
* txr.1: Documented.
* stdlib/doc-syms.tl: Updated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Makefile, alloca.h, args.c, args.h, arith.c, arith.h, buf.c,
buf.h, chksum.c, chksum.h, chksums/crc32.c, chksums/crc32.h,
combi.c, combi.h, debug.c, debug.h, eval.c, eval.h, ffi.c,
ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c,
glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lib.c,
lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c,
parser.h, parser.l, parser.y, rand.c, rand.h, regex.c,
regex.h, signal.c, signal.h, socket.c, socket.h,
stdlib/asm.tl, stdlib/awk.tl, stdlib/build.tl,
stdlib/compiler.tl, stdlib/constfun.tl, stdlib/conv.tl,
stdlib/copy-file.tl, stdlib/debugger.tl, stdlib/defset.tl,
stdlib/doloop.tl, stdlib/each-prod.tl, stdlib/error.tl,
stdlib/except.tl, stdlib/ffi.tl, stdlib/getopts.tl,
stdlib/getput.tl, stdlib/hash.tl, stdlib/ifa.tl,
stdlib/keyparams.tl, stdlib/match.tl, stdlib/op.tl,
stdlib/optimize.tl, stdlib/package.tl, stdlib/param.tl,
stdlib/path-test.tl, stdlib/pic.tl, stdlib/place.tl,
stdlib/pmac.tl, stdlib/quips.tl, stdlib/save-exe.tl,
stdlib/socket.tl, stdlib/stream-wrap.tl, stdlib/struct.tl,
stdlib/tagbody.tl, stdlib/termios.tl, stdlib/trace.tl,
stdlib/txr-case.tl, stdlib/type.tl, stdlib/vm-param.tl,
stdlib/with-resources.tl, stdlib/with-stream.tl,
stdlib/yield.tl, stream.c, stream.h, struct.c, struct.h,
strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h,
termios.c, termios.h, time.c, time.h, tree.c, tree.h, txr.c,
txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h:
License reformatted.
* lex.yy.c.shipped, y.tab.c.shipped, y.tab.h.shipped: Updated.
|
|
This affects run-time also. Txr installations where the
executable is not in directory ending in ${bindir}
will look for stdlib rather than share/txr/stdlib,
relative to the determined installation directory.
* txr.c (sysroot_init): If we detect relative to the short
name, or fall back on the program directory, use stdlib
rather than share/txr/stdlib as the stdlib_path.
* INSTALL: Update some installation notes not to refer to
share/txr/stdlib but stdlib.
* Makefile (STDLIB_SRCS): Refer to stdlib, not
share/txr/stdlib.
(clean): In unconfigured mode, remove the old share/txr/stdlib
entirely. Remove .tlo files from stdlib.
(install): Install lib materials from stdlib.
* txr.1: Updated documentation under Deployment Directory Structure.
* share/txr/stdlib/{asm,awk,build,cadr}.tl:
Renamed to stdlib/{asm,awk,build,cadr}.tl.
* share/txr/stdlib/{compiler,conv,copy-file,debugger}.tl:
Renamed to stdlib/{compiler,conv,copy-file,debugger}.tl.
* share/txr/stdlib/{defset,doc-lookup,doc-syms,doloop}.tl:
Renamed to stdlib/{defset,doc-lookup,doc-syms,doloop}.tl.
* share/txr/stdlib/{each-prod,error,except,ffi}.tl:
Renamed to stdlib/{each-prod,error,except,ffi}.tl.
* share/txr/stdlib/{getopts,getput,hash,ifa}.tl:
Renamed to stdlib/{getopts,getput,hash,ifa}.tl.
* share/txr/stdlib/{keyparams,match,op,optimize}.tl:
Renamed to stdlib/{keyparams,match,op,optimize}.tl.
* share/txr/stdlib/{package,param,path-test,pic}.tl:
Renamed to stdlib/{package,param,path-test,pic}.tl.
* share/txr/stdlib/{place,pmac,quips,save-exe}.tl:
Renamed to stdlib/{place,pmac,quips,save-exe}.tl.
* share/txr/stdlib/{socket,stream-wrap,struct,tagbody}.tl:
Renamed to stdlib/{socket,stream-wrap,struct,tagbody}.tl.
* share/txr/stdlib/{termios,trace,txr-case,type}.tl:
Renamed to stdlib/{termios,trace,txr-case,type}.tl.
* share/txr/stdlib/{ver,vm-param,with-resources,with-stream}.tl:
Renamed to stdlib/{ver,vm-param,with-resources,with-stream}.tl.
* share/txr/stdlib/yield.tl: Renamed to stdlib/yield.tl.
* share/txr/stdlib/{txr-case,ver}.txr:
Renamed to stdlib/{txr-case,ver}.txr.
* gencadr.txr: Update to stdlib/place.tl.
* genman.txr: Update to stdlib/cadr.tl.
|