summaryrefslogtreecommitdiffstats
path: root/stdlib/optimize.tl
Commit message (Collapse)AuthorAgeFilesLines
...
* compiler: replace late-peephole pattern with real approach.Kaz Kylheku2021-11-261-12/+8
| | | | | | | | | | | | | | | * stdlib/optimize.tl (basic-blocks merge-jump-thunks): For each group of candidate jump-blocks, search the entire basic block list for one more jump block which is identical to the others, except that it doesn't end in a jmp, but rather falls through to the same target that the group jumps to. That block is then included in the group, and also becomes the default leader since it is pushed to the front. (basic-blocks late-peephole): Remove the peephole pattern which tried to attack the same problem. The new approach is much more effective: when compiling stdlib, 77 instances occur in which such a block is identified and added! The peephole pattern only matched six times.
* compiler: another late peephole pattern.Kaz Kylheku2021-11-261-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are six hits for this in stdlib, two of them in optimize.tl itself. The situation is like: label1 (instruction ...) (jmp label3) label2 (instruction ...) label3 where (instruction ...) is identical in both places. label1 and label2 are functionally identical blocks, which means that the pattern can be rewritten as: label1 label2 (instruction ...) label3 When the label1 path is taken it's faster due to the elimination of the jmp, and code size is reduced by two instructions. This pattern may possibly the result of an imperfection in the design of the basic-blocks method merge-jump-thunks. The label1 and label2 blocks are functionally identical. But merge-jump-thunks looks strictly for blocks that end in a jmp instruction. It's possible that there was a jmp instruction and the end of the label2 block, which got eliminated before merge-jump-thunks, which is done late, just before late-peephole. * stdlib/optimize.tl (basic-blocks late-peephole): New rule for the above pattern.
* compiler: late-peephole match for a wasteful register move.Kaz Kylheku2021-11-101-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | I've noticed a wasteful instruction pattern in the compiled code for the sys:awk-code-move-check function: 7: 2C020007 movsr t2 t7 8: 3800000E if t7 14 9: 00000007 10: 20050002 gcall t2 1 t9 d1 t8 t6 t7 11: 00090001 12: 00080401 13: 00070006 14: 10000002 end t2 Here, the t2 register can be replaced with t7 in the gcall and end instructions, and the movsr t2 t7 instruction can be eliminated. It looks like something that could somehow be targeted more generally with a clever peephole pattern assisted by data-flow information, but for now I'm sticking in a dumb late-peephole pattern which just looks for this very specific pattern. * stdlib/optimize.tl (basic-blocks late-peephole): Add new pattern for eliminating the move, as described above. There are several hits for this in the standard library in addition to the awk module: in the path-test, each-prod and getopts files.
* compiler: improvement in wasteful jmp elimination.Kaz Kylheku2021-10-231-4/+29
| | | | | | | | | | | | | | | * stdlib/compiler.tl (compiler optimize): After the dataflow-driven peephole optimization, call elim-dead-code again. * stdlib/optimize.tl (basic-blocks check-bypass-empty): New method. (basic-bocks elim-dead-code): After eliminating unreachable blocks from the list, we use check-bypass-empty to squeeze out any empty blocks: blocks that have no instructions in their list, other than the leading label. This helps elim-next-jmp to find more opportunities to eliminate a wasteful jump, because sometimes these jumps straddle over empty blocks. Furthermore, elim-next-jmp can generate more empty blocks itself; so we check for this situation, delete the blocks and iterate.
* compiler: also clear .next before re-linking graph.Kaz Kylheku2021-10-231-0/+1
| | | | | | | | | | | * stdlib/optimize.tl (basic-blocks elim-dead-code): When clearing the links before recalculating the graph, also clear the next field of every block, because link-graph only sets this if necessary, assuming that the value is already nil. Thus by not resetting it, we risk leaving stale values in these .next fields. The code reachability calculation relies on next fields, so if they falsely point to dead blocks, those blocks could be falsely retained.
* compiler: peephole: recalc and rescan in a few more cases.Kaz Kylheku2021-09-301-0/+9
| | | | | | * stdlib/optimize.tl (basic-block peephole-block): In a few more cases, we should be setting the recalc flag to recalculate liveness, and adding some block to the rescan list.
* compiler: fix up linkage and recalc liveness in one peephole case.Kaz Kylheku2021-09-301-8/+11
| | | | | | | | * stdlib/optimize.tl (basic-blocks peephole-block): Rearrange the code a bit so we don't calculate the xbl, which potentially performs the cut-block, if there is no ybl. We set the bb.recalc flag since we may have cut a block into two and have redirected a jump, and also update the links for that reason.
* compiler: eliminate some redundant hash lookups.Kaz Kylheku2021-09-301-11/+12
| | | | | | | * stdlib/optimize.tl (basic-blocks thread-jumps-block, basic-blocks peephole-block): Streamline various cases of [bb.hash jlabel] being wastefully called twice to look up the same block referenced by the same label.
* compiler: eliminate basic-block next-block method.Kaz Kylheku2021-09-301-23/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The next-block method performs a linear search through the basic block list, which is physically ordered, to find the physically next block. This is actually not needed in several places that use the method; they want the logically next block, which is nil if the last instruction of the current doesn't potentially fall through to the next block. In the one place where we need the physical next block, in the elim-next-jump method, the caller can dynamically provide this, since it walks the list. * stdlib/optimize.tl (basic-block next-block): Method removed. (basic-block link-graph): We revise the logic here a little bit. All of the cases now consistently use the mechanism of setting link-next to nil to indicate that they don't fall through to the next block. The special case handling of the close instruction is clearer. (basic-block (thread-jumps-block, peephole-block)): Several cases here referred to the physically next block via the next-block method. This can be replaced by just using the next pointer, which will be the same. (basic-blocks elim-next-jump): This method now takes the next block as an argument, since there is no next-block method it can call to get the physcally next block. The argument is guaranteed non-null, so we don't need the .? null-safe slot access syntax. (basic-blocks elim-dead-code): Iterate over the next blocks simultaneously, and pass the next block into elim-next-jump. We no longer iterate over the last block, which has no physical next block.
* compiler: cosmetic: merge set assignments.Kaz Kylheku2021-09-301-7/+7
| | | | | | * stdlib/optimize.tl (basic-blocks join-block): Merge set forms into one. (basic-blocks elim-dead-code): Likewise.
* compiler: improvement in next-block linking.Kaz Kylheku2021-09-291-3/+3
| | | | | | | * stdlib/optimize.tl (basic-blocks link-graph): Do not search the entire list for a block's successor. Iterate over the cdr of the list in parallel, so that the next block is directly available at each iteration.
* compiler: remove impossible cases in jump threading.Kaz Kylheku2021-09-291-8/+4
| | | | | | | | | | | | | | * stdlib/optimize.tl (basic-blocks thread-jumps-block): There can't be any instructions in a basic block after an if or ifq, so in these cases, jrest is always nil. Let's ignore that nil efficiently with @nil, and get rid of the cut-block branches of the code. There is a similar case in peephole-block, but the target of the jump is an (end ...) which doesn't necessarily end a basic block. I temporarily put in an (assert (null jrest)), and, surprisingly, it never went off during a rebuild of the library or running of the test case. Still, only a jend ends a basic block; it would not be correct to simplify it like these two cases in thread-jumps-block.
* compiler: peephole: merge basic blocks when jmp removed.Kaz Kylheku2021-09-291-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | When a jmp instruction is removed from (necessarily) the end of a basic block, that basic block can be merged with the next one, and marked for re-scanning. A test case where this eliminates wasteful register-register move instruction is (match #(@a) #(3) a). * stdlib/optimize.tl (basic-blocks): New slot, tryjoin. (basic-blocks join-block): Null out the instruction list of the joined block. This helps if we do this during peephole processing, because it happens in the middle of an iteration over a list of blocks which can still visit the next block that has been merged into its predecesor; we don't want to be processing instructions that are no longer relevant. (basic-blocks peephole-block): In the one case where a conditional instruction is deleted from the end of the basic block, we add the block to the rescan list, and also to the tryjoin list. If the block can be merged with the next one, that can create more opportunities for peephole optimization. (basic-blocks peephole): Use zap in a few places to condense the logic of sampling a state variable that needs to be nulled out. Add the processing of the tryjoin list: pop basic blocks from the list, and try to merge them with their successor, if possible. We handle cases here where the next block could itself be in tryjoin. Also, if we join any blocks, we set the recalc flag to recalculate the liveness info.
* compiler: code clean-up in peephole optimizer.Kaz Kylheku2021-09-281-5/+5
| | | | | | | | | | | * stdlib/optimize.tl (basic-blocks peephole-block): When we match a branching instruction, including jend, we know that's the end of the basic block. So there is no need to splice the (rest insns) into the output; let's get rid of that. On the other hand, there is also no need to have a specific pattern match for the end of the list such as ((jmp @label)). This costs extra cycles to validate. Let's consistently match these basic-block terminating instructions using prefix patterns like ((jmp @label) . @nil)).
* defset: add set-mask and clear-mask.Paul A. Patience2021-09-141-4/+5
| | | | | | | | | | | | | | | | | | | | * stdlib/defset.tl (set-mask, clear-mask): New update macros. * stdlib/optimize.tl (calc-liveness): Use the new macros. * stdlib/socket.tl (sys:str-inaddr-net-impl, str-in6addr-net): Same. * stdlib/termios.tl (set-iflags, set-oflags, set-cflags, set-lflags, clear-iflags, clear-oflags, clear-cflags, clear-lflags): Same. * lisplib.c (defset_set_entries): Add set-mask and clear-mask to autoload symbols for defset. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
* license: reformat to fit 80 columns.Kaz Kylheku2021-08-161-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Makefile, alloca.h, args.c, args.h, arith.c, arith.h, buf.c, buf.h, chksum.c, chksum.h, chksums/crc32.c, chksums/crc32.h, combi.c, combi.h, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, socket.c, socket.h, stdlib/asm.tl, stdlib/awk.tl, stdlib/build.tl, stdlib/compiler.tl, stdlib/constfun.tl, stdlib/conv.tl, stdlib/copy-file.tl, stdlib/debugger.tl, stdlib/defset.tl, stdlib/doloop.tl, stdlib/each-prod.tl, stdlib/error.tl, stdlib/except.tl, stdlib/ffi.tl, stdlib/getopts.tl, stdlib/getput.tl, stdlib/hash.tl, stdlib/ifa.tl, stdlib/keyparams.tl, stdlib/match.tl, stdlib/op.tl, stdlib/optimize.tl, stdlib/package.tl, stdlib/param.tl, stdlib/path-test.tl, stdlib/pic.tl, stdlib/place.tl, stdlib/pmac.tl, stdlib/quips.tl, stdlib/save-exe.tl, stdlib/socket.tl, stdlib/stream-wrap.tl, stdlib/struct.tl, stdlib/tagbody.tl, stdlib/termios.tl, stdlib/trace.tl, stdlib/txr-case.tl, stdlib/type.tl, stdlib/vm-param.tl, stdlib/with-resources.tl, stdlib/with-stream.tl, stdlib/yield.tl, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, time.c, time.h, tree.c, tree.h, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h: License reformatted. * lex.yy.c.shipped, y.tab.c.shipped, y.tab.h.shipped: Updated.
* file layout: moving share/txr/stdlib to stdlib.Kaz Kylheku2021-06-241-0/+606
This affects run-time also. Txr installations where the executable is not in directory ending in ${bindir} will look for stdlib rather than share/txr/stdlib, relative to the determined installation directory. * txr.c (sysroot_init): If we detect relative to the short name, or fall back on the program directory, use stdlib rather than share/txr/stdlib as the stdlib_path. * INSTALL: Update some installation notes not to refer to share/txr/stdlib but stdlib. * Makefile (STDLIB_SRCS): Refer to stdlib, not share/txr/stdlib. (clean): In unconfigured mode, remove the old share/txr/stdlib entirely. Remove .tlo files from stdlib. (install): Install lib materials from stdlib. * txr.1: Updated documentation under Deployment Directory Structure. * share/txr/stdlib/{asm,awk,build,cadr}.tl: Renamed to stdlib/{asm,awk,build,cadr}.tl. * share/txr/stdlib/{compiler,conv,copy-file,debugger}.tl: Renamed to stdlib/{compiler,conv,copy-file,debugger}.tl. * share/txr/stdlib/{defset,doc-lookup,doc-syms,doloop}.tl: Renamed to stdlib/{defset,doc-lookup,doc-syms,doloop}.tl. * share/txr/stdlib/{each-prod,error,except,ffi}.tl: Renamed to stdlib/{each-prod,error,except,ffi}.tl. * share/txr/stdlib/{getopts,getput,hash,ifa}.tl: Renamed to stdlib/{getopts,getput,hash,ifa}.tl. * share/txr/stdlib/{keyparams,match,op,optimize}.tl: Renamed to stdlib/{keyparams,match,op,optimize}.tl. * share/txr/stdlib/{package,param,path-test,pic}.tl: Renamed to stdlib/{package,param,path-test,pic}.tl. * share/txr/stdlib/{place,pmac,quips,save-exe}.tl: Renamed to stdlib/{place,pmac,quips,save-exe}.tl. * share/txr/stdlib/{socket,stream-wrap,struct,tagbody}.tl: Renamed to stdlib/{socket,stream-wrap,struct,tagbody}.tl. * share/txr/stdlib/{termios,trace,txr-case,type}.tl: Renamed to stdlib/{termios,trace,txr-case,type}.tl. * share/txr/stdlib/{ver,vm-param,with-resources,with-stream}.tl: Renamed to stdlib/{ver,vm-param,with-resources,with-stream}.tl. * share/txr/stdlib/yield.tl: Renamed to stdlib/yield.tl. * share/txr/stdlib/{txr-case,ver}.txr: Renamed to stdlib/{txr-case,ver}.txr. * gencadr.txr: Update to stdlib/place.tl. * genman.txr: Update to stdlib/cadr.tl.