| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* eval.c (eval_init): Register new intrinsics.
* lib.c, lib.h (subq, subql, subqual, subst): New functions.
* tests/012/seq.tl: New test cases.
* stdlib/optimize.tl (subst): Function removed. The new subst
drop-in replaces this one.
* txr.1: Documented.
* stdlib/doc-syms.tl: Updated.
|
|
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks late-peephole): In one
pattern, an instruction that is recognized as (jend ...)
is inadvertently rewritten to (end ...). Since this is the
last optimization stage, currently, and end and jend are
synonyms for the same opcode, it doesn't matter.
But it could turn into a bug; let's fix it.
|
|
|
|
|
|
| |
* stdlib/optimize (basic-block print): Print the label of the
next block, rather than the block itself. This reduces the
verbosity during debugging.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This work addresses the following issues, which cause compiled
functions to use more stack space than necessary.
Firstly, the compiler doesn't allocate registers tightly.
Every closure's registers can start at t2, but this isn't
done. Secondly, data flow optimizations eliminate registers,
leaving gaps in the register allocation. Code ends up strange:
you may see register t63 used in a function where the next
highest register below that is t39: evidence that a large
number of temporary variables got mapped to registers and
eliminated.
In this change, an optimization is introduced, active at
*opt-level* 6 which compacts the t registers for every
closure: every closure's registers are renumbered starting
from t2. Then, each closure' generating close instruction
is also updated to indicate the accurate number of registers
ensuring no space is wasted on the stack when the closure
is prepared for execution.
* stdlib/compiler.tl (compiler optimize): At optimization
level 6, insert a call to basic-blocks compact-tregs
just before the instruction are pulled out and put through
the late peephole pass.
* stdlib/optimize.tl (basic-block): New slot, closer.
This is pronounced "clozer", as in one who closes.
For any basic block that is the head of a closure (the entry
point ito the closure code), this slot is set to point
to the previous block: the one which ends in the close
instruction which creates this closure: the closer. This is
important because the close instruction can use t registers
for arguments, and those registers belong to the closure.
Those argument registers must be included in the renaming.
(basic-blocks): New slots closures and cl-hash. The former
lists the closure head basic blocks; all the basic blocks
which are the entry blocks of a closure. The cl-hash
associates each head block with a list of all the blocks
(including the head block).
(basic-blocks identify-closures): New method. This scans the
list of blocks, identifying the closure heads, and associating
them with their closers, to establish the bb.closure list.
Then for each closure head in the list, the graph is searched
to find all the blocks of a closure, and these lists are put
into the bb.cl-hash.
(basic-block fill-treg-compacting-map): This method scans
a basic block, ferreting out all of it t registers, and
adds renaming entries for them into a hash that is passed in.
If the block is the head of a closure, then the close
instruction of the block's closer is also scanned for t
registers: but only the arguments, not the destination
register.
(basic-block apply-treg-compacting-map): This method
renames the t register of a block using the renaming map.
It follows the registers in the same way as
fill-treg-compacting-map, and consquently also goes into
the close instruction to rename the argument registers, as
necessary. When tweaking the close instruction, it also
updates the instruction's number-of-tregs field with the
newly calculated number, which comes from the map size.
(basic-blocks compact-tregs): This method ties it together,
using the above methods to identify the basic blocks belonging
to closures, build register renaming maps for them, and hen
apply the renaming.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is related to the pattern in the previous commit. When we have a
situation like this:
lab1
mov tn nil
lab2
ifq tn nil lab4
lab3
gcall tn ...
We know that if lab1 is entered, then lab2 will necessarily
fall through: the lab4 branch is not taken because tn is nil.
But then, tn is clobbered immediately in lab3 by the gcall tn.
In other words, the value stored into tn by lab1 is never used.
Therefore, we can remove the "mov tn nil" instruction and
move the l1 label.
lab2
ifq tn nil lab4
lab1
lab3
gcall tn ...
There are 74 hits for this pattern in stdlib.
* stdlib/optimize.tl (basic-blocks late-peephole): Implement the
above pattern.
|
|
|
|
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks late-peephole): This pattern doesn't
match any more because of code removed by the previous commit.
If we shorten it by removing the lab1 block, then it matches.
Because the pattern is shorter, the reduction being performed
by the replacement is no longer needed; it has already been done.
The remaining value is that threads the jump from lab3 to lab4.
This missing threading is what I noticed when evaluating the
effects of the previous patch; this restores it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks merge-jump-thunks): For each
group of candidate jump-blocks, search the entire basic block
list for one more jump block which is identical to the
others, except that it doesn't end in a jmp, but rather
falls through to the same target that the group jumps to.
That block is then included in the group, and also becomes the
default leader since it is pushed to the front.
(basic-blocks late-peephole): Remove the peephole pattern
which tried to attack the same problem. The new approach is
much more effective: when compiling stdlib, 77 instances occur in which
such a block is identified and added! The peephole pattern only
matched six times.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are six hits for this in stdlib, two of them in
optimize.tl itself. The situation is like:
label1
(instruction ...)
(jmp label3)
label2
(instruction ...)
label3
where (instruction ...) is identical in both places.
label1 and label2 are functionally identical blocks, which
means that the pattern can be rewritten as:
label1
label2
(instruction ...)
label3
When the label1 path is taken it's faster due to the
elimination of the jmp, and code size is reduced by two
instructions.
This pattern may possibly the result of an imperfection in the
design of the basic-blocks method merge-jump-thunks.
The label1 and label2 blocks are functionally identical.
But merge-jump-thunks looks strictly for blocks that end in a
jmp instruction. It's possible that there was a jmp
instruction and the end of the label2 block, which got
eliminated before merge-jump-thunks, which is done late, just
before late-peephole.
* stdlib/optimize.tl (basic-blocks late-peephole): New rule
for the above pattern.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've noticed a wasteful instruction pattern in the compiled
code for the sys:awk-code-move-check function:
7: 2C020007 movsr t2 t7
8: 3800000E if t7 14
9: 00000007
10: 20050002 gcall t2 1 t9 d1 t8 t6 t7
11: 00090001
12: 00080401
13: 00070006
14: 10000002 end t2
Here, the t2 register can be replaced with t7 in the gcall
and end instructions, and the movsr t2 t7 instruction can
be eliminated.
It looks like something that could somehow be targeted more generally
with a clever peephole pattern assisted by data-flow information,
but for now I'm sticking in a dumb late-peephole pattern which just
looks for this very specific pattern.
* stdlib/optimize.tl (basic-blocks late-peephole): Add new
pattern for eliminating the move, as described above.
There are several hits for this in the standard library in addition to
the awk module: in the path-test, each-prod and getopts files.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* stdlib/compiler.tl (compiler optimize): After the dataflow-driven
peephole optimization, call elim-dead-code again.
* stdlib/optimize.tl (basic-blocks check-bypass-empty): New method.
(basic-bocks elim-dead-code): After eliminating unreachable blocks
from the list, we use check-bypass-empty to squeeze out any
empty blocks: blocks that have no instructions in their list,
other than the leading label. This helps elim-next-jmp
to find more opportunities to eliminate a wasteful jump, because
sometimes these jumps straddle over empty blocks.
Furthermore, elim-next-jmp can generate more empty blocks itself;
so we check for this situation, delete the blocks and iterate.
|
|
|
|
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks elim-dead-code): When
clearing the links before recalculating the graph, also clear
the next field of every block, because link-graph only sets
this if necessary, assuming that the value is already nil.
Thus by not resetting it, we risk leaving stale values in
these .next fields. The code reachability calculation relies
on next fields, so if they falsely point to dead blocks, those
blocks could be falsely retained.
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-block peephole-block): In a few more
cases, we should be setting the recalc flag to recalculate liveness,
and adding some block to the rescan list.
|
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks peephole-block): Rearrange the code a
bit so we don't calculate the xbl, which potentially performs the
cut-block, if there is no ybl. We set the bb.recalc flag since we may
have cut a block into two and have redirected a jump, and also
update the links for that reason.
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks thread-jumps-block,
basic-blocks peephole-block): Streamline various cases of [bb.hash
jlabel] being wastefully called twice to look up the same block
referenced by the same label.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The next-block method performs a linear search through the basic
block list, which is physically ordered, to find the physically
next block. This is actually not needed in several places that use the
method; they want the logically next block, which is nil if the last
instruction of the current doesn't potentially fall through to the next
block. In the one place where we need the physical next block, in the
elim-next-jump method, the caller can dynamically provide this, since it
walks the list.
* stdlib/optimize.tl (basic-block next-block): Method removed.
(basic-block link-graph): We revise the logic here a little bit. All of
the cases now consistently use the mechanism of setting link-next to
nil to indicate that they don't fall through to the next block.
The special case handling of the close instruction is clearer.
(basic-block (thread-jumps-block, peephole-block)): Several cases here
referred to the physically next block via the next-block method. This
can be replaced by just using the next pointer, which will be the same.
(basic-blocks elim-next-jump): This method now takes the next block as
an argument, since there is no next-block method it can call to get
the physcally next block. The argument is guaranteed non-null, so we
don't need the .? null-safe slot access syntax.
(basic-blocks elim-dead-code): Iterate over the next blocks
simultaneously, and pass the next block into elim-next-jump.
We no longer iterate over the last block, which has no physical next
block.
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks join-block): Merge set
forms into one.
(basic-blocks elim-dead-code): Likewise.
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks link-graph): Do not search
the entire list for a block's successor. Iterate over the cdr
of the list in parallel, so that the next block is directly
available at each iteration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks thread-jumps-block): There
can't be any instructions in a basic block after an if or ifq,
so in these cases, jrest is always nil. Let's ignore that nil
efficiently with @nil, and get rid of the cut-block branches
of the code. There is a similar case in peephole-block, but
the target of the jump is an (end ...) which doesn't
necessarily end a basic block. I temporarily put in an (assert
(null jrest)), and, surprisingly, it never went off during a
rebuild of the library or running of the test case. Still,
only a jend ends a basic block; it would not be correct to
simplify it like these two cases in thread-jumps-block.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a jmp instruction is removed from (necessarily) the end of a basic
block, that basic block can be merged with the next one, and marked for
re-scanning.
A test case where this eliminates wasteful register-register move
instruction is (match #(@a) #(3) a).
* stdlib/optimize.tl (basic-blocks): New slot, tryjoin.
(basic-blocks join-block): Null out the instruction list of the joined
block. This helps if we do this during peephole processing, because it
happens in the middle of an iteration over a list of blocks which can
still visit the next block that has been merged into its predecesor; we
don't want to be processing instructions that are no longer relevant.
(basic-blocks peephole-block): In the one case where a conditional
instruction is deleted from the end of the basic block, we add the block
to the rescan list, and also to the tryjoin list. If the block can
be merged with the next one, that can create more opportunities for
peephole optimization.
(basic-blocks peephole): Use zap in a few places to condense the logic
of sampling a state variable that needs to be nulled out. Add the
processing of the tryjoin list: pop basic blocks from the list, and try
to merge them with their successor, if possible. We handle cases here
where the next block could itself be in tryjoin. Also, if we join any
blocks, we set the recalc flag to recalculate the liveness info.
|
|
|
|
|
|
|
|
|
|
|
| |
* stdlib/optimize.tl (basic-blocks peephole-block): When we match a
branching instruction, including jend, we know that's the end of the
basic block. So there is no need to splice the (rest insns) into
the output; let's get rid of that. On the other hand, there is also no
need to have a specific pattern match for the end of the list such
as ((jmp @label)). This costs extra cycles to validate. Let's
consistently match these basic-block terminating instructions using
prefix patterns like ((jmp @label) . @nil)).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* stdlib/defset.tl (set-mask, clear-mask): New update macros.
* stdlib/optimize.tl (calc-liveness): Use the new macros.
* stdlib/socket.tl (sys:str-inaddr-net-impl, str-in6addr-net):
Same.
* stdlib/termios.tl (set-iflags, set-oflags, set-cflags,
set-lflags, clear-iflags, clear-oflags, clear-cflags,
clear-lflags): Same.
* lisplib.c (defset_set_entries): Add set-mask and clear-mask
to autoload symbols for defset.
* txr.1: Documented.
* stdlib/doc-syms.tl: Updated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Makefile, alloca.h, args.c, args.h, arith.c, arith.h, buf.c,
buf.h, chksum.c, chksum.h, chksums/crc32.c, chksums/crc32.h,
combi.c, combi.h, debug.c, debug.h, eval.c, eval.h, ffi.c,
ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c,
glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lib.c,
lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c,
parser.h, parser.l, parser.y, rand.c, rand.h, regex.c,
regex.h, signal.c, signal.h, socket.c, socket.h,
stdlib/asm.tl, stdlib/awk.tl, stdlib/build.tl,
stdlib/compiler.tl, stdlib/constfun.tl, stdlib/conv.tl,
stdlib/copy-file.tl, stdlib/debugger.tl, stdlib/defset.tl,
stdlib/doloop.tl, stdlib/each-prod.tl, stdlib/error.tl,
stdlib/except.tl, stdlib/ffi.tl, stdlib/getopts.tl,
stdlib/getput.tl, stdlib/hash.tl, stdlib/ifa.tl,
stdlib/keyparams.tl, stdlib/match.tl, stdlib/op.tl,
stdlib/optimize.tl, stdlib/package.tl, stdlib/param.tl,
stdlib/path-test.tl, stdlib/pic.tl, stdlib/place.tl,
stdlib/pmac.tl, stdlib/quips.tl, stdlib/save-exe.tl,
stdlib/socket.tl, stdlib/stream-wrap.tl, stdlib/struct.tl,
stdlib/tagbody.tl, stdlib/termios.tl, stdlib/trace.tl,
stdlib/txr-case.tl, stdlib/type.tl, stdlib/vm-param.tl,
stdlib/with-resources.tl, stdlib/with-stream.tl,
stdlib/yield.tl, stream.c, stream.h, struct.c, struct.h,
strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h,
termios.c, termios.h, time.c, time.h, tree.c, tree.h, txr.c,
txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h:
License reformatted.
* lex.yy.c.shipped, y.tab.c.shipped, y.tab.h.shipped: Updated.
|
|
This affects run-time also. Txr installations where the
executable is not in directory ending in ${bindir}
will look for stdlib rather than share/txr/stdlib,
relative to the determined installation directory.
* txr.c (sysroot_init): If we detect relative to the short
name, or fall back on the program directory, use stdlib
rather than share/txr/stdlib as the stdlib_path.
* INSTALL: Update some installation notes not to refer to
share/txr/stdlib but stdlib.
* Makefile (STDLIB_SRCS): Refer to stdlib, not
share/txr/stdlib.
(clean): In unconfigured mode, remove the old share/txr/stdlib
entirely. Remove .tlo files from stdlib.
(install): Install lib materials from stdlib.
* txr.1: Updated documentation under Deployment Directory Structure.
* share/txr/stdlib/{asm,awk,build,cadr}.tl:
Renamed to stdlib/{asm,awk,build,cadr}.tl.
* share/txr/stdlib/{compiler,conv,copy-file,debugger}.tl:
Renamed to stdlib/{compiler,conv,copy-file,debugger}.tl.
* share/txr/stdlib/{defset,doc-lookup,doc-syms,doloop}.tl:
Renamed to stdlib/{defset,doc-lookup,doc-syms,doloop}.tl.
* share/txr/stdlib/{each-prod,error,except,ffi}.tl:
Renamed to stdlib/{each-prod,error,except,ffi}.tl.
* share/txr/stdlib/{getopts,getput,hash,ifa}.tl:
Renamed to stdlib/{getopts,getput,hash,ifa}.tl.
* share/txr/stdlib/{keyparams,match,op,optimize}.tl:
Renamed to stdlib/{keyparams,match,op,optimize}.tl.
* share/txr/stdlib/{package,param,path-test,pic}.tl:
Renamed to stdlib/{package,param,path-test,pic}.tl.
* share/txr/stdlib/{place,pmac,quips,save-exe}.tl:
Renamed to stdlib/{place,pmac,quips,save-exe}.tl.
* share/txr/stdlib/{socket,stream-wrap,struct,tagbody}.tl:
Renamed to stdlib/{socket,stream-wrap,struct,tagbody}.tl.
* share/txr/stdlib/{termios,trace,txr-case,type}.tl:
Renamed to stdlib/{termios,trace,txr-case,type}.tl.
* share/txr/stdlib/{ver,vm-param,with-resources,with-stream}.tl:
Renamed to stdlib/{ver,vm-param,with-resources,with-stream}.tl.
* share/txr/stdlib/yield.tl: Renamed to stdlib/yield.tl.
* share/txr/stdlib/{txr-case,ver}.txr:
Renamed to stdlib/{txr-case,ver}.txr.
* gencadr.txr: Update to stdlib/place.tl.
* genman.txr: Update to stdlib/cadr.tl.
|