summaryrefslogtreecommitdiffstats
path: root/stdlib/optimize.tl
Commit message (Collapse)AuthorAgeFilesLines
* New functions: subq, subql, subqual and subst.Kaz Kylheku2021-12-221-5/+0
| | | | | | | | | | | | | | | * eval.c (eval_init): Register new intrinsics. * lib.c, lib.h (subq, subql, subqual, subst): New functions. * tests/012/seq.tl: New test cases. * stdlib/optimize.tl (subst): Function removed. The new subst drop-in replaces this one. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
* compiler: small end/jend issue in late-peephole.Kaz Kylheku2021-12-111-1/+1
| | | | | | | | | * stdlib/optimize.tl (basic-blocks late-peephole): In one pattern, an instruction that is recognized as (jend ...) is inadvertently rewritten to (end ...). Since this is the last optimization stage, currently, and end and jend are synonyms for the same opcode, it doesn't matter. But it could turn into a bug; let's fix it.
* compiler: tweak in basic block debug print.Kaz Kylheku2021-12-111-1/+1
| | | | | | * stdlib/optimize (basic-block print): Print the label of the next block, rather than the block itself. This reduces the verbosity during debugging.
* compiler: register-compacting optimization.Kaz Kylheku2021-12-101-0/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This work addresses the following issues, which cause compiled functions to use more stack space than necessary. Firstly, the compiler doesn't allocate registers tightly. Every closure's registers can start at t2, but this isn't done. Secondly, data flow optimizations eliminate registers, leaving gaps in the register allocation. Code ends up strange: you may see register t63 used in a function where the next highest register below that is t39: evidence that a large number of temporary variables got mapped to registers and eliminated. In this change, an optimization is introduced, active at *opt-level* 6 which compacts the t registers for every closure: every closure's registers are renumbered starting from t2. Then, each closure' generating close instruction is also updated to indicate the accurate number of registers ensuring no space is wasted on the stack when the closure is prepared for execution. * stdlib/compiler.tl (compiler optimize): At optimization level 6, insert a call to basic-blocks compact-tregs just before the instruction are pulled out and put through the late peephole pass. * stdlib/optimize.tl (basic-block): New slot, closer. This is pronounced "clozer", as in one who closes. For any basic block that is the head of a closure (the entry point ito the closure code), this slot is set to point to the previous block: the one which ends in the close instruction which creates this closure: the closer. This is important because the close instruction can use t registers for arguments, and those registers belong to the closure. Those argument registers must be included in the renaming. (basic-blocks): New slots closures and cl-hash. The former lists the closure head basic blocks; all the basic blocks which are the entry blocks of a closure. The cl-hash associates each head block with a list of all the blocks (including the head block). (basic-blocks identify-closures): New method. This scans the list of blocks, identifying the closure heads, and associating them with their closers, to establish the bb.closure list. Then for each closure head in the list, the graph is searched to find all the blocks of a closure, and these lists are put into the bb.cl-hash. (basic-block fill-treg-compacting-map): This method scans a basic block, ferreting out all of it t registers, and adds renaming entries for them into a hash that is passed in. If the block is the head of a closure, then the close instruction of the block's closer is also scanned for t registers: but only the arguments, not the destination register. (basic-block apply-treg-compacting-map): This method renames the t register of a block using the renaming map. It follows the registers in the same way as fill-treg-compacting-map, and consquently also goes into the close instruction to rename the argument registers, as necessary. When tweaking the close instruction, it also updates the instruction's number-of-tregs field with the newly calculated number, which comes from the map size. (basic-blocks compact-tregs): This method ties it together, using the above methods to identify the basic blocks belonging to closures, build register renaming maps for them, and hen apply the renaming.
* compiler: new late-peephole case.Kaz Kylheku2021-11-291-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is related to the pattern in the previous commit. When we have a situation like this: lab1 mov tn nil lab2 ifq tn nil lab4 lab3 gcall tn ... We know that if lab1 is entered, then lab2 will necessarily fall through: the lab4 branch is not taken because tn is nil. But then, tn is clobbered immediately in lab3 by the gcall tn. In other words, the value stored into tn by lab1 is never used. Therefore, we can remove the "mov tn nil" instruction and move the l1 label. lab2 ifq tn nil lab4 lab1 lab3 gcall tn ... There are 74 hits for this pattern in stdlib. * stdlib/optimize.tl (basic-blocks late-peephole): Implement the above pattern.
* compiler: revise no-longer-matching late peephole case.Kaz Kylheku2021-11-291-14/+3
| | | | | | | | | | | * stdlib/optimize.tl (basic-blocks late-peephole): This pattern doesn't match any more because of code removed by the previous commit. If we shorten it by removing the lab1 block, then it matches. Because the pattern is shorter, the reduction being performed by the replacement is no longer needed; it has already been done. The remaining value is that threads the jump from lab3 to lab4. This missing threading is what I noticed when evaluating the effects of the previous patch; this restores it.
* compiler: replace late-peephole pattern with real approach.Kaz Kylheku2021-11-261-12/+8
| | | | | | | | | | | | | | | * stdlib/optimize.tl (basic-blocks merge-jump-thunks): For each group of candidate jump-blocks, search the entire basic block list for one more jump block which is identical to the others, except that it doesn't end in a jmp, but rather falls through to the same target that the group jumps to. That block is then included in the group, and also becomes the default leader since it is pushed to the front. (basic-blocks late-peephole): Remove the peephole pattern which tried to attack the same problem. The new approach is much more effective: when compiling stdlib, 77 instances occur in which such a block is identified and added! The peephole pattern only matched six times.
* compiler: another late peephole pattern.Kaz Kylheku2021-11-261-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are six hits for this in stdlib, two of them in optimize.tl itself. The situation is like: label1 (instruction ...) (jmp label3) label2 (instruction ...) label3 where (instruction ...) is identical in both places. label1 and label2 are functionally identical blocks, which means that the pattern can be rewritten as: label1 label2 (instruction ...) label3 When the label1 path is taken it's faster due to the elimination of the jmp, and code size is reduced by two instructions. This pattern may possibly the result of an imperfection in the design of the basic-blocks method merge-jump-thunks. The label1 and label2 blocks are functionally identical. But merge-jump-thunks looks strictly for blocks that end in a jmp instruction. It's possible that there was a jmp instruction and the end of the label2 block, which got eliminated before merge-jump-thunks, which is done late, just before late-peephole. * stdlib/optimize.tl (basic-blocks late-peephole): New rule for the above pattern.
* compiler: late-peephole match for a wasteful register move.Kaz Kylheku2021-11-101-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | I've noticed a wasteful instruction pattern in the compiled code for the sys:awk-code-move-check function: 7: 2C020007 movsr t2 t7 8: 3800000E if t7 14 9: 00000007 10: 20050002 gcall t2 1 t9 d1 t8 t6 t7 11: 00090001 12: 00080401 13: 00070006 14: 10000002 end t2 Here, the t2 register can be replaced with t7 in the gcall and end instructions, and the movsr t2 t7 instruction can be eliminated. It looks like something that could somehow be targeted more generally with a clever peephole pattern assisted by data-flow information, but for now I'm sticking in a dumb late-peephole pattern which just looks for this very specific pattern. * stdlib/optimize.tl (basic-blocks late-peephole): Add new pattern for eliminating the move, as described above. There are several hits for this in the standard library in addition to the awk module: in the path-test, each-prod and getopts files.
* compiler: improvement in wasteful jmp elimination.Kaz Kylheku2021-10-231-4/+29
| | | | | | | | | | | | | | | * stdlib/compiler.tl (compiler optimize): After the dataflow-driven peephole optimization, call elim-dead-code again. * stdlib/optimize.tl (basic-blocks check-bypass-empty): New method. (basic-bocks elim-dead-code): After eliminating unreachable blocks from the list, we use check-bypass-empty to squeeze out any empty blocks: blocks that have no instructions in their list, other than the leading label. This helps elim-next-jmp to find more opportunities to eliminate a wasteful jump, because sometimes these jumps straddle over empty blocks. Furthermore, elim-next-jmp can generate more empty blocks itself; so we check for this situation, delete the blocks and iterate.
* compiler: also clear .next before re-linking graph.Kaz Kylheku2021-10-231-0/+1
| | | | | | | | | | | * stdlib/optimize.tl (basic-blocks elim-dead-code): When clearing the links before recalculating the graph, also clear the next field of every block, because link-graph only sets this if necessary, assuming that the value is already nil. Thus by not resetting it, we risk leaving stale values in these .next fields. The code reachability calculation relies on next fields, so if they falsely point to dead blocks, those blocks could be falsely retained.
* compiler: peephole: recalc and rescan in a few more cases.Kaz Kylheku2021-09-301-0/+9
| | | | | | * stdlib/optimize.tl (basic-block peephole-block): In a few more cases, we should be setting the recalc flag to recalculate liveness, and adding some block to the rescan list.
* compiler: fix up linkage and recalc liveness in one peephole case.Kaz Kylheku2021-09-301-8/+11
| | | | | | | | * stdlib/optimize.tl (basic-blocks peephole-block): Rearrange the code a bit so we don't calculate the xbl, which potentially performs the cut-block, if there is no ybl. We set the bb.recalc flag since we may have cut a block into two and have redirected a jump, and also update the links for that reason.
* compiler: eliminate some redundant hash lookups.Kaz Kylheku2021-09-301-11/+12
| | | | | | | * stdlib/optimize.tl (basic-blocks thread-jumps-block, basic-blocks peephole-block): Streamline various cases of [bb.hash jlabel] being wastefully called twice to look up the same block referenced by the same label.
* compiler: eliminate basic-block next-block method.Kaz Kylheku2021-09-301-23/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The next-block method performs a linear search through the basic block list, which is physically ordered, to find the physically next block. This is actually not needed in several places that use the method; they want the logically next block, which is nil if the last instruction of the current doesn't potentially fall through to the next block. In the one place where we need the physical next block, in the elim-next-jump method, the caller can dynamically provide this, since it walks the list. * stdlib/optimize.tl (basic-block next-block): Method removed. (basic-block link-graph): We revise the logic here a little bit. All of the cases now consistently use the mechanism of setting link-next to nil to indicate that they don't fall through to the next block. The special case handling of the close instruction is clearer. (basic-block (thread-jumps-block, peephole-block)): Several cases here referred to the physically next block via the next-block method. This can be replaced by just using the next pointer, which will be the same. (basic-blocks elim-next-jump): This method now takes the next block as an argument, since there is no next-block method it can call to get the physcally next block. The argument is guaranteed non-null, so we don't need the .? null-safe slot access syntax. (basic-blocks elim-dead-code): Iterate over the next blocks simultaneously, and pass the next block into elim-next-jump. We no longer iterate over the last block, which has no physical next block.
* compiler: cosmetic: merge set assignments.Kaz Kylheku2021-09-301-7/+7
| | | | | | * stdlib/optimize.tl (basic-blocks join-block): Merge set forms into one. (basic-blocks elim-dead-code): Likewise.
* compiler: improvement in next-block linking.Kaz Kylheku2021-09-291-3/+3
| | | | | | | * stdlib/optimize.tl (basic-blocks link-graph): Do not search the entire list for a block's successor. Iterate over the cdr of the list in parallel, so that the next block is directly available at each iteration.
* compiler: remove impossible cases in jump threading.Kaz Kylheku2021-09-291-8/+4
| | | | | | | | | | | | | | * stdlib/optimize.tl (basic-blocks thread-jumps-block): There can't be any instructions in a basic block after an if or ifq, so in these cases, jrest is always nil. Let's ignore that nil efficiently with @nil, and get rid of the cut-block branches of the code. There is a similar case in peephole-block, but the target of the jump is an (end ...) which doesn't necessarily end a basic block. I temporarily put in an (assert (null jrest)), and, surprisingly, it never went off during a rebuild of the library or running of the test case. Still, only a jend ends a basic block; it would not be correct to simplify it like these two cases in thread-jumps-block.
* compiler: peephole: merge basic blocks when jmp removed.Kaz Kylheku2021-09-291-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | When a jmp instruction is removed from (necessarily) the end of a basic block, that basic block can be merged with the next one, and marked for re-scanning. A test case where this eliminates wasteful register-register move instruction is (match #(@a) #(3) a). * stdlib/optimize.tl (basic-blocks): New slot, tryjoin. (basic-blocks join-block): Null out the instruction list of the joined block. This helps if we do this during peephole processing, because it happens in the middle of an iteration over a list of blocks which can still visit the next block that has been merged into its predecesor; we don't want to be processing instructions that are no longer relevant. (basic-blocks peephole-block): In the one case where a conditional instruction is deleted from the end of the basic block, we add the block to the rescan list, and also to the tryjoin list. If the block can be merged with the next one, that can create more opportunities for peephole optimization. (basic-blocks peephole): Use zap in a few places to condense the logic of sampling a state variable that needs to be nulled out. Add the processing of the tryjoin list: pop basic blocks from the list, and try to merge them with their successor, if possible. We handle cases here where the next block could itself be in tryjoin. Also, if we join any blocks, we set the recalc flag to recalculate the liveness info.
* compiler: code clean-up in peephole optimizer.Kaz Kylheku2021-09-281-5/+5
| | | | | | | | | | | * stdlib/optimize.tl (basic-blocks peephole-block): When we match a branching instruction, including jend, we know that's the end of the basic block. So there is no need to splice the (rest insns) into the output; let's get rid of that. On the other hand, there is also no need to have a specific pattern match for the end of the list such as ((jmp @label)). This costs extra cycles to validate. Let's consistently match these basic-block terminating instructions using prefix patterns like ((jmp @label) . @nil)).
* defset: add set-mask and clear-mask.Paul A. Patience2021-09-141-4/+5
| | | | | | | | | | | | | | | | | | | | * stdlib/defset.tl (set-mask, clear-mask): New update macros. * stdlib/optimize.tl (calc-liveness): Use the new macros. * stdlib/socket.tl (sys:str-inaddr-net-impl, str-in6addr-net): Same. * stdlib/termios.tl (set-iflags, set-oflags, set-cflags, set-lflags, clear-iflags, clear-oflags, clear-cflags, clear-lflags): Same. * lisplib.c (defset_set_entries): Add set-mask and clear-mask to autoload symbols for defset. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
* license: reformat to fit 80 columns.Kaz Kylheku2021-08-161-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Makefile, alloca.h, args.c, args.h, arith.c, arith.h, buf.c, buf.h, chksum.c, chksum.h, chksums/crc32.c, chksums/crc32.h, combi.c, combi.h, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, socket.c, socket.h, stdlib/asm.tl, stdlib/awk.tl, stdlib/build.tl, stdlib/compiler.tl, stdlib/constfun.tl, stdlib/conv.tl, stdlib/copy-file.tl, stdlib/debugger.tl, stdlib/defset.tl, stdlib/doloop.tl, stdlib/each-prod.tl, stdlib/error.tl, stdlib/except.tl, stdlib/ffi.tl, stdlib/getopts.tl, stdlib/getput.tl, stdlib/hash.tl, stdlib/ifa.tl, stdlib/keyparams.tl, stdlib/match.tl, stdlib/op.tl, stdlib/optimize.tl, stdlib/package.tl, stdlib/param.tl, stdlib/path-test.tl, stdlib/pic.tl, stdlib/place.tl, stdlib/pmac.tl, stdlib/quips.tl, stdlib/save-exe.tl, stdlib/socket.tl, stdlib/stream-wrap.tl, stdlib/struct.tl, stdlib/tagbody.tl, stdlib/termios.tl, stdlib/trace.tl, stdlib/txr-case.tl, stdlib/type.tl, stdlib/vm-param.tl, stdlib/with-resources.tl, stdlib/with-stream.tl, stdlib/yield.tl, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, time.c, time.h, tree.c, tree.h, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h: License reformatted. * lex.yy.c.shipped, y.tab.c.shipped, y.tab.h.shipped: Updated.
* file layout: moving share/txr/stdlib to stdlib.Kaz Kylheku2021-06-241-0/+606
This affects run-time also. Txr installations where the executable is not in directory ending in ${bindir} will look for stdlib rather than share/txr/stdlib, relative to the determined installation directory. * txr.c (sysroot_init): If we detect relative to the short name, or fall back on the program directory, use stdlib rather than share/txr/stdlib as the stdlib_path. * INSTALL: Update some installation notes not to refer to share/txr/stdlib but stdlib. * Makefile (STDLIB_SRCS): Refer to stdlib, not share/txr/stdlib. (clean): In unconfigured mode, remove the old share/txr/stdlib entirely. Remove .tlo files from stdlib. (install): Install lib materials from stdlib. * txr.1: Updated documentation under Deployment Directory Structure. * share/txr/stdlib/{asm,awk,build,cadr}.tl: Renamed to stdlib/{asm,awk,build,cadr}.tl. * share/txr/stdlib/{compiler,conv,copy-file,debugger}.tl: Renamed to stdlib/{compiler,conv,copy-file,debugger}.tl. * share/txr/stdlib/{defset,doc-lookup,doc-syms,doloop}.tl: Renamed to stdlib/{defset,doc-lookup,doc-syms,doloop}.tl. * share/txr/stdlib/{each-prod,error,except,ffi}.tl: Renamed to stdlib/{each-prod,error,except,ffi}.tl. * share/txr/stdlib/{getopts,getput,hash,ifa}.tl: Renamed to stdlib/{getopts,getput,hash,ifa}.tl. * share/txr/stdlib/{keyparams,match,op,optimize}.tl: Renamed to stdlib/{keyparams,match,op,optimize}.tl. * share/txr/stdlib/{package,param,path-test,pic}.tl: Renamed to stdlib/{package,param,path-test,pic}.tl. * share/txr/stdlib/{place,pmac,quips,save-exe}.tl: Renamed to stdlib/{place,pmac,quips,save-exe}.tl. * share/txr/stdlib/{socket,stream-wrap,struct,tagbody}.tl: Renamed to stdlib/{socket,stream-wrap,struct,tagbody}.tl. * share/txr/stdlib/{termios,trace,txr-case,type}.tl: Renamed to stdlib/{termios,trace,txr-case,type}.tl. * share/txr/stdlib/{ver,vm-param,with-resources,with-stream}.tl: Renamed to stdlib/{ver,vm-param,with-resources,with-stream}.tl. * share/txr/stdlib/yield.tl: Renamed to stdlib/yield.tl. * share/txr/stdlib/{txr-case,ver}.txr: Renamed to stdlib/{txr-case,ver}.txr. * gencadr.txr: Update to stdlib/place.tl. * genman.txr: Update to stdlib/cadr.tl.