summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* build: use yacc pattern rule with "y" as stem.Kaz Kylheku2021-03-221-9/+1
| | | | | | | | | | | | | | | | Thanks to a recent discussion in the GNU Make mailing list about some issue related to grouped patterns, I hit upon the realization that the rule y.tab.h y.tab.c: parser.y has a common stem, y, which can be exploited to write a pattern rule. This is only active in maintainer mode; in user mode, the y.tab.h and y.tab.c are separately produced from .shipped files. * Makefile (y.tab.h): Eliminate rule which detects a removed y.tab.h. (y.tab.c): Delete rule, replacing with new rule. (%.tab.c %.tab.h): New pattern rule, which groups the targets together.
* ffi: support float type as variadic argument.Kaz Kylheku2021-03-222-8/+69
| | | | | | | | | | | | | | | | | | | The float type promotes to double when passed as a variadic argument. This patch adds internal FFI types which models that promotion. It uses double for its C type, while still performing the range checks for float. Also, the types be-float and le-float are rejected from being variadic arguments. * share/txr/stdlib/ffi.tl (analyze-argtypes): Rewrite function, adding validation and substitution for the variadic part of the argument type list. Map float type to double, and reject be-float and le-float. * txr.1: Documented.
* ffi: fix missing support for misaligned ushort.Kaz Kylheku2021-03-221-0/+2
| | | | | | | | | * ffi.c (ffi_ushort_get): Add the missing align_sw_get macro that checks for and handles misalignment. Commit e539fbc2af3bd4a5bd4ca3d9b567862005509aa1 made on 2017-06-05 added these macros and fixes to numerous functions. The commit message claims that ffi_ushort_get was among the functions fixed, but no change was made to the function.
* ffi: ffi_uchar_put: statement after declaration.Kaz Kylheku2021-03-221-1/+1
| | | | * ffi.c (ffi_uchar_put): Fix for C90 compat and consistency.
* compiler: improve end-propagating optimization.Kaz Kylheku2021-03-171-7/+13
| | | | | | | | | | | | | | | | | | | | When a jmp instruction is replaced by the end instruction that it jumps to, or possibly by a two-instruction sequence ending in end, there can be more opportunities to optimize. For instance, the second reduction in a sequence like this: mov t3, t5 mov t3, t5 jmp label --> end t3 --> end t5 ... label: end t3 * share/txr/stdlib/optimize.tl (basic-blocks peephole-block): If the end-propagation is done, we change the linkage of the current block to indicate that it has no next blocks. We add it to the rescan list and set the recalc flag so the liveness information is updated.
* compiler: use registers for function parameters.Kaz Kylheku2021-03-171-67/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | If a function has nothing but parameters that are not captured in lexical closures, they can be converted registers. The function then doesn't need a variable frame for its parameters. This is similar to the eliminate-frame optimization, and borrows the same code and logic. * share/txr/stdlib/compiler.tl (compiler eliminate-frame): We no longer assume that the code coming in starts with a frame instruction we can eliminate using (cdr code) and an end insruction we can eliminate with a trailing pattern. This is because when this function is used for a lambda, this is not the case; a lambda's variable frame is implicit, created by the VM for any lambda with a nonzero frame size, rather than by a frame instruction. (compiler comp-let): In the call to eliminate-frame, we now trim away the first and last instruction, to get rid of the (frame ...) and (end ...). (compiler comp-lambda-impl): Install a closure spy against the variable frame to detect which variables are captured in closures, similarly to in comp-let. Under the right conditions, pass the code through eliminate-frame to turn the variables into registers. The close instruction has to be rewritten, because the frame size is now zero, and the number of t registers has changed.
* poll: iterate sequences efficiently.Kaz Kylheku2021-03-162-7/+11
| | | | | | | * sysif.c (poll_wrap): Use seq_iter for efficience when poll_list is a vector or other generalized sequence. * txr.1: Change wording to say that poll-list is a sequence.
* compiler: split variable spies into two types.Kaz Kylheku2021-03-161-33/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have the situation that there are effectively two kinds of spies: let constructs plant spies only in order to learn about what variables are being captured, whereas lambdas plant spies in order to intercept variable accesses (and to inform the spies that are interested in what is captured). Let us split these up into two separate types, with different methods, in different stacks. * share/txr/stdlib/compiler.tl (struct var-spy): Renamed to closure-spy. (var-spy accessed, var-spy assigned): Methods removed: the closure-spy type has only the captured method. (struct capture-var-spy): Renamed to accesss-spy. (capture-var-spy var-spies): Renamed to access-spy closure-spies. (capture-var-spy captured): Method removed; the access spy is doesn't receive captured calls. (struct compiler): Slot var-spies removed, replaced with closure-spies and access-spies. (with-var-spy): Macro removed. (with-spy): New function. (with-closure-spy, with-access-spy): New macros. (compiler push-var-spy, compiler pop-var-spy): Methods removed. (compiler push-closure-spy, compiler pop-closure-spy, compiler push-access-spy, compiler pop-access-spy): New methods. (compiler comp-var, compiler comp-setq, compiler comp-lisp1-setq, compiler comp-lisp1-value): Walk new access-spies list rather than var-spies to report about accesses and assignments. (compiler comp-let): Use with-closure-spy macro rather than with var-spy. The spy object is now a closure-spy type, and the variable is cspy rather than vspy. (compiler comp-lambda-impl): Use with-access-spy instead of with-var-spy. The spy object is now of type access-spy. It refers to the current me.closure-spies from the compiler.
* tests: fix retest logic for parallel operation.Kaz Kylheku2021-03-161-7/+5
| | | | | | | | | I'm giving up and just using a recursive make invocation for the "retest" target. * Makefile (retest): Depend on nothing. Remove the tst directory and invoke make tests recursively. (tests.clean): Target removed.
* compiler: trim unused accumulation from var-spy.Kaz Kylheku2021-03-161-8/+2
| | | | | | | | | | | The var-spy structure is only being used for detecting captured variables, not assigned or accessed variables. So the information about accessed and assigned is being wastefully accumulated, never used. * share/txr/stdlib/compiler.tl (struct var-spy): Remove slots acc-vars and set-vars. (var-spy accessed, var-spy assigned): Methods become no-ops.
* compiler: eliminate unused list in optimizer.Kaz Kylheku2021-03-161-12/+9
| | | | | | | * share/txr/stdlib/optimize.tl (basic-blocks elim-dead-code): Ironically, we have a dead variable named unreachable in this function. Let's remove it and the build form which calculates it.
* doc: improve predicate pattern description.Kaz Kylheku2021-03-151-5/+10
| | | | | * txr.1: Fix wrong word; clarify what is first form and second form.
* doc: pattern matching: revise, take out of beta.Kaz Kylheku2021-03-151-67/+140
| | | | | | | | * txr.1: structural pattern matching intro is divided into sections, rearranged and reworded, with additional detail, especially in regard to variables. Also, "a the" typo fixed in list pattern notation section. Pattern matching is now taken out of beta; the paragraph about the beta status is removed.
* doc: formally document meta-syntax.Kaz Kylheku2021-03-151-7/+84
| | | | | | | | | * txr.1: Add a missing section defining meta-symbols, meta-numbers and meta-expressions, which explains the sys:var and sys:expr shorthand. Also documented are corner cases involving deceptive floating-point syntax. Everywhere in the document, we normalize "meta-variable" as "meta-symbol".
* cat-str: seq_iter conversion,Kaz Kylheku2021-03-142-13/+30
| | | | | | | * lib.c (cat_str): Traverse sequences of strings efficiently using seq_iter framework. * txr.1: Document.
* lib: fix neglect to use self variable.Kaz Kylheku2021-03-145-31/+33
| | | | | | | | | | | | | | | | * arith.c (toint): Use self instead of repeating function name in diagnostic. * ftw.c (ftw_wrap): Likewise. * glib.c (glow_wrap): Likewise. * rand.c (make_random_state, random): Likewise. * lib.c (nreverse, reverse, remove_if, int_str, less, chr_str, chr_str_set, unintern, rehome_sym, in): Likewise. (vector, obj_print_impl): Pass self to helper function instead of repeating identical literal.
* doc: rewrite macro intro.Kaz Kylheku2021-03-121-20/+31
| | | | * txr.1: Rewrote Macros section intro paragraphs.
* doc: mistaken reference to command-getKaz Kylheku2021-03-121-1/+1
| | | | | * txr.1: The description of command-get-buf mistakenly refers to command-get.
* lib: fix hard-coded cat-str in diagnostic.Kaz Kylheku2021-03-121-2/+2
| | | | | | * lib.c (cat_str_measure): Use the self parameter in diagnostics rather than cat-str, so errors are reported against the correct function.
* Version 254txr-254Kaz Kylheku2021-03-106-1231/+1262
| | | | | | | | | | * RELNOTES: Updated. * configure, txr.1: Bumped version and date. * share/txr/stdlib/ver.tl: Bumped. * txr.vim, tl.vim: Regenerated.
* compiler: eliminate unused global symbol accesses.Kaz Kylheku2021-03-101-1/+1
| | | | | | * share/txr/stdlib/optimize.tl (basic-blocks peephole-blocks): Extend dead reg mov pattern to also handle the getlx, getv, getf and getfb instructions.
* compiler: use effect-free criterion for elimination.Kaz Kylheku2021-03-102-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | When the result of a function call is not used, the call can be eliminated if the function has no effects. Effect-free functions are superset of constant-foldable functions. Not all effect-free functions are const-foldable because in some cases, creating an object at run-time is a documented semantics which cannot be constant-folded. For instance (list 1) cannot be constant folded, because it may be relied upon to generate a fresh object each time it is called. However, bringing a new list to life is not an effect. If the value is not used, we can safely eliminate it. The same reasoning applies to (gensym "abc"). It must generate a unique symbol each time, and so cannot be constant-folded. But call to gensym whose value is not used can be eliminated. * share/txr/stdlib/compiler.tl (%effect-free-funs%): List of side-effect-free functions registered in eval.c. (%effect-free%): Hash of effect-free functions, incorporating %const-foldable% also. * share/txr/stdlib/optimize.tl (basic-blocks peephole-block): Refer to %effect-free% rather than %const-foldable%.
* compiler: more const-foldable functions.Kaz Kylheku2021-03-101-6/+7
| | | | | * share/txr/stdlib/compiler.tl (%const-foldable-funs%): Add rest, nilf, tf, join, join-with, empty.
* compiler: eliminate unused closures.Kaz Kylheku2021-03-101-1/+10
| | | | | | | | | | | | | | | | | | | In this patch, we optimize away closures that are not used. Unused closures that have been hoisted to the top level are not affected. We look for close instructions which produce a dead treg, and rewrite these to jmp instructions to the same label. When this happend, we set a flag for a dead code elimination pass to be done again, to actually remove the now unreachable closure code. * share/txr/stdlib/optimize.tl (struct basic-blocks): New slot, reelim, indicating that dead code elimination is to be done again after peephole since the control flow graph has changed. (basic-blocks peephole-block): New pattern for eliminating a dead register, targeting the close instruction. (basic-blocks peephole): After peephole, check whether the reelim flag has been set and do elim-dead-code again.
* compiler: eliminate dead calls.Kaz Kylheku2021-03-102-74/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main idea in this patch is to identify calls functions whose values are not used and which have no side effects. For this purpose, we borrow the same set of functions that we use as targets for constant folding: those in the compiler's %const-foldable% hash. A call is dead if it is a gcall or gapply instruction which calls one of these functions, and the destination register is dead. To maximize the opportunities for this elimination, whenever we eliminate such an instruction, we mark the block for re-scanning, and we re-calculate the liveness info for that block and then globally. * share/txr/stdlib/compiler.tl (struct compiler): New slots, datavec and symvec. (compiler get-datavec): Cache the calculated datavec in the new datavec slot. (compiler get-symvec): Cache the calculated symvec in the new symvec slot. (compiler optimize): Pass the symvec to the basic-blocks BOA constructor; it is now needed for identifying functions that are referenced by symvec index number in the gcall and gapply instructions. * share/txr/stdlib/optimize.tl (struct basic-blocks): New symvec slot, added to the the BOA parameter list also. New slot recalc, indicating re-calculation of liveness is needed. (basic-blocks cut-block): Use pushnew to put bl onto the rescan list, to prevent duplicates. Also push the new block onto the rescan list. (basic-blocks local-liveness): This method can now be called again to re-calculate local liveness, so we must reset the live slot to nil. (basic-blocks calc-liveness): Take an optional list of blocks to scan for local liveness, defaulting to all of them. (basic-blocks peephole-block): Factor out some register liveness tests to local functions. Add a pattern targetting gcall and gapply instructions, testing for a dead register from a call to a %const-foldable% function. Use pushnew everywhere to add to the rescan list. Set the recalc flag when the liveness-based reductions are applied. (basic-blocks peephole): If there are blocks to be scanned again, then if the recalc flag is set, recalculate local liveness for all the blocks to be re-scanned and re-do global liveness.
* sort: bugfix: broken for vectors/strings.Kaz Kylheku2021-03-101-1/+2
| | | | | * lib.c (sort): Return the copied and sorted object object, not the original.
* lib: new functions join, join-with.Kaz Kylheku2021-03-095-17/+85
| | | | | | | | | | | | | | | | | | | | | | | | | That old cat-str function is often a pain, requiring the pieces as a list. We have a sys:fmt-join that is undocumented. That functions is now exposed as usr:join, and documented. Also introducing join-with that takes a separator as the leftmost argument. Thus (op join-with "::") gives us a function that joins pieces with :: in between. * eval.c (eval_init): Regiser fmt_join function under join symbol also. Register join-with. * lib.c (join_with): New function. (fmt_join): Now a wrapper for join_with that passes a nil separator. * lib.h (join_with): Declared. * share/txr/stdlib/optimize.tl (basic-blocks join-blocks): Rename the local function join, which now triggers a warning about a standard function being redefined. * txr.1: Redocumented cat-str, and documented join-with and join.
* compiler: optimization control.Kaz Kylheku2021-03-084-107/+188
| | | | | | | | | | | | | | | | | | | | | | | | | | | * lisplib.c (compiler_set_entries): Register *opt-level* symbol for auto-loading. * share/txr/stdlib/compiler.tl (*opt-level*): New special variable. (compiler comp-let): Eliminate frames only at level 3. (compiler comp-lambda-impl): Lift load time at level 3. (compiler comp-arith-form): Constant-folding only at lvl 1. (compiler comp-fun-form): Algebraic substitutions and reductions and constant-folding only at level 1. (compiler comp-apply-call): Constant folding at level 1. (compiler optimize): Optimizations off if level zero. Thread jumps and eliminate dead code at level 2. Flow-analysis based optimizations at level 3. Additional optimizations at level 4. (compile comp-block): Block elimination at level 3. (compile-toplevel): Rebind *opt-level*, giving it value zero if it is previously nil. * share/txr/stdlib/optimize.tl (basic-blocks get-insns): Just retrieve the instructions, letting caller decide whether to call late-peephole or not. * txr.1: Documented *opt-level*.
* Version 253txr-253Kaz Kylheku2021-03-066-297/+337
| | | | | | | | | | * RELNOTES: Updated. * configure, txr.1: Bumped version and date. * share/txr/stdlib/ver.tl: Bumped. * txr.vim, tl.vim: Regenerated.
* hashing: bug: hash-equal zero: floats and bignums.Kaz Kylheku2021-03-051-2/+2
| | | | | | hash.c (equal_hash): Do not multiply by the seed if it is zero; substitute 1. Otherwise under the default seed of zero, the hash becomes zero.
* compiler: streamline load-time hoisting of calls.Kaz Kylheku2021-03-041-17/+23
| | | | | | | | * share/txr/stdlib/compiler.tl (compiler comp-fun-form): Rearrange the logic so that we only try the speculative compilation when the three main conditions are right, not before. This drastically reduces the number of times we need to take the compiler snapshot.
* compiler: bug: duplicate code in load-time lifting.Kaz Kylheku2021-03-041-5/+22
| | | | | | | | | | | | | | | | | | | | | | | This issue affects the original code which lifts lambdas to load-time, as well as the new, recently added code for similarly lifting functional combinator expressions. The problem is that the trick works by compiling an expression twice. The result of the first compile is thrown away in the case when we compile it again in the load-time context. But compiling has a side effect: the expression itself may have an embedded load-time-liftable expression, which gets deposited into the load-time fragment list. Thus garbage ends up in the list of load-time fragments. We likely want to save and restore other things, like allocated D regisers. * share/txr/stdlib/compiler.tl (compiler shapshot, compiler restore): New methods. (comp-lambda-impl, comp-fun): Save a snapshot of the compiler state before doing the speculative compilation. If we don't use that compilation, we restore the state from the snapshot.
* compiler: frame depth bug in load-time.Kaz Kylheku2021-03-041-6/+3
| | | | | | | | | | | | | | * share/txr/stdlib/compiler.tl (compile-in-toplevel): This macro must not save and restore the nlev variable of the compiler. The nlev variable is a maximum: it measures the maximum environment depth, establishing the depth of the display needed when executing the code of the resulting VM description. By saving and restoring this variable around the compilation of a load-time form, we cause the load-time form's contribution to the maximum to be ignored. If the load-time form perpetrates nesting that is deeper than other code, it will execute with an under-sized display, causing an assertion crash or corruption.
* math: defend against locale decimal separator.Kaz Kylheku2021-03-041-0/+14
| | | | | | | | | | | | | | The int_flo function also has a sensitivity to locale because the bignum handling is text-based, involving printing a floating-point value, and then assuming it contains a period decimal separator. * arith.c (int_flo): If CONFIG_LOCALE_TOLERANCE is enabled, look for dec_point rather than '.' in the formatted number. When matching and destructuring the number with sscanf, don't look for the '.' character, but rather a complemented character set which can maching nothing but the separator, whatever that is.
* lib: defend against locale-specific wcstod.Kaz Kylheku2021-03-043-3/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The wcstod function has locale-specific behavior. It uses a locale-specific decimal separator character, which may not be the period. Though in TXR we never call setlocale in order to activate localization, it is a good idea to put in code to defend against this. If locale is ever unleashed on the code, it really botches our floating-point handling. However, let's keep that defensive logic disabled for now using the preprocessor. The strategy is to inquire about the locale's decimal character at startup. Then, in the flo_str function, we preprocess the input by converting the decimal period to the locale-specific character before calling wcstod. On the output side, we also deal with it in the format function; we call sprintf, and then convert the locale-specific characer to period. I tested all this by temporarily introducing the setlocale call, and switching to a locale with a comma as the separator, geting make tests to pass, and doing some interactive testing. This is not going to receive coverage in the test suite. * lib.c (dec_point): New global variable. (flo_str): If dec_point isn't '.', we must copy the string into a local buffer, which we get from alloca, and edit any '.' that it contains to dec_point. (locale_init): New function; initializes dec_point. (init): Call locale_init. * lib.h (dec_point): Declared. * stream.c (formatv): Don't look for a '.' in the result of printing a double using %e; look for dec_point. Post-process floating-point sprinf by substituting occurrences of dec_point with the period.
* compiler: another late-peephole pattern.Kaz Kylheku2021-03-031-0/+21
| | | | | | | | | | | * share/txr/stdlib/optimize.tl (late-peephole): Add a 6-instruction pattern which is seen in pattern-matching code in optimize.tlo and compiler.tlo. This can be tidied up a little bit, reducing the instructions to 5, eliminating redundant comparisons and threading a jump. At this point we are probably working too hard for too little gain. More effort in other areas like common subexpression elimination could produce bigger wins.
* compiler: new late-peephole pass.Kaz Kylheku2021-03-031-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | I noticed a lot of this in compiled code: (if reg label1) label2 (jmp label3) label1 by inverting the test, this can be shortened to (ifq reg (t 0) label2) label1 We must keep label1 in case it is the target of other branches. Also, we only perform this if label2 is unreachable: the (jmp label3) instruction is reached only when the previous if branch is not taken. * share/txr/stdlib/optimize.tl (basic-blocks get-insns): Call new late-peephole method. (basic-blocks late-peephole): New method, incorporating the above pattern.
* compiler: lift uslot and umethod forms too.Kaz Kylheku2021-03-031-1/+1
| | | | | | | | | | | | | | | | The uslot and umethod functions produce functions; and should be lifted to load-time, if possible. For instance, the .foo expression [mapcar .foo list] translates to a (uslot 'foo) function call. This references no variables, and so is liftable to load-time. The umethod function is an extension of uslot that allows partially applied method arguments to be carried. If those arguments are all functional, the umethod call is liftable. * share/txr/stdlib/compiler.tl (%functional-funs%): Include umethod and uslot.
* lib: remove unnecessary load-time forms.Kaz Kylheku2021-03-032-10/+9
| | | | | | | | | | | | | | Because of the previous optimization, some load-time forms that appear in the library are unnecessary. * share/txr/stdlib/optimize.tl (basic-blocks merge-jump-thunks): Remove load-time around functional combinators. * share/txr/stdlib/socket.tl (sys:in6addr-condensed-text): Remove one load-time that is now unnecessary, and one around an op which was unnecessary even at the time it was written, since the lambda is lifted without it.
* compiler: lift functional expressions to load-time.Kaz Kylheku2021-03-031-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea behind this optimization is that certain expressions that only calculate functions can be hoisted to load time. These expressions meet these criteria: 1. Are not already in a top-level or load-time context. 2. Are function calls to a standard library functional operator like chain andf, orf, juxt, ... 3. Do not access any variables. 3. Do not access any functions other than public (usr package) global functions in the standard library. An example of such an expression might be: [chain cdr [iff symbolp list]] If such an expression is embedded in a function, we don't want the function to re-calculate it every time, which requires time and generates garbage. We can transform it to the equivalent of: (load-time [chain cdr [iff symbolp list]]) to have it calculated once. * share/txr/stdlib/compiler.tl (%functional-funs%, %functional%): New global variables. (compiler comp-fun-form): After compiling the function call, check for the conditions for lifting. If so, compile the form again as a load-time literal. The logic is similar to how lambdas are lifted to load-time, though the conditions are different.
* compiler: merge duplicate jump blocks.Kaz Kylheku2021-03-022-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've noticed that some duplicate short blocks are generated, which look like this. Often they are consecutive, which is why they are noticeable. These can become one block: mov t17 d3 jmp label17 mov t17 d3 jmp label17 mov t17 d3 jmp label17 We identify identical blocks by looking for short instruction sequences that end in an unconditional jmp, and then we group duplicates in a hash table keyed on the instruction sequence. We must ignore the label: the first instruction in each block is a unique label. We have to be careful about which ones to delete. Any block that is entered from the top must be preserved. When these blocks are identified, at least one block must remain that is removable for the optimization to be able to do anything. If all blocks are removable, we pick a leader which is preserved. Otherwise we pick a leader from the preserved blocks. The non-preserved blocks are deleted, and all jumps to them from other blocks are redirected to jumps to the leader. * share/txr/stdlib/compiler.tl (optimize): Call merge-jump-thunks as the last pass. * share/txr/stdlib/optimize.tl (basic-blocks merge-jump-thunks): New method.
* compiler: basic-block print method for debugging.Kaz Kylheku2021-03-021-1/+10
| | | | | | * share/txr/stdlib/optimize.tl (basic-block print): Print basic blocks such that related blocks are printed by their label rather than the whole graph.
* compiler: optimize annoying instruction sequence.Kaz Kylheku2021-03-021-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a specific instruction sequence that often occurs when pattern matching code is compiled. It is not easy to attack by the existing optimization framework. Once the code is divided into basic blocks, it spans multiple blocks. It's not a good fit for either jump threading or intra-basic-block peephhole reductions. It is exemplified by the following disassembly snippet, where the pattern is shown in the box: 5: 3C00000C ifq v00000 d1 12 --------+ 6: 08000401 | 7: 20020010 gcall t16 0 v00001 d2 | 8: 08010000 | 9: 00000402 | +---------------------------+ | |10: 2C060403 movsr t6 d3 | | |11: 3400000D jmp 13 | | |12: 2C060000 movsr t6 nil | <--------+ |13: 3C000028 ifq t6 nil 40 | +---------------------------+ 14: 00060000 15: 3C000019 ifq v00001 d1 25 The d3 register holds t. Of course, it's always a different d register in the various occurrences of this pattern. The register t6 also varies, but it's the same register in all three places in the pattern. The purpose of this "kernel" of four instructions is: - if it is entered from the top, set the the t6 register to t and jump to 40. - if it is entered at instruction 12, set the t6 register to nil, and fall through to the next block. This can be instead expressed by to three instructions: +---------------------------+ | |(mov t6 d3) | | |(jmp 40) | | |(mov t6 nil) | <--------+ +---------------------------+ This revised code then more conductive to further optimizations in ways that the original is not. In some cases, like the Ackermann test case, both moves into t6 will disappear entirely. * share/txr/stdlib/optimize.tl (basic-blocks :postinit): Pass instructions through early-peephole. (early-peephole): New function, where we can look for and rewrite instruction patterns over the entire block of code, before any division into basic blocks. Here, we look for the four-instruction kernel and rewrite it.
* compiler: mistake in (if (equal ...) ..) pattern.Kaz Kylheku2021-03-021-1/+1
| | | | | | | | | | * share/txr/stdlib/compiler.tl (compiler comp-if): An invalid or expression means we are not matching the if over equal strength reduction pattern, and miscompiling an unrelated expression (luckily, one that is unlikely to occur in any reasonable application). Unfortunately, this correction spoils the way the pattern matching Ackermann test case is being compiled.
* compiler: move short end sequences before branch.Kaz Kylheku2021-03-021-0/+8
| | | | | | | | | | | | | | | | | When a function ends like this: jmp label ... label: end t42 The jmp can just be replaced by "end t42". * share/txr/stdlib/optimize.tl (basic-blocks peephole-block): Implement the pattern to replace a jump to a one or two instruction sequence ending in a jend with that sequence. Including the extra instruction helps if the target has not been peepholed yet. In the pattern matching Ackermann case, two instructions are promoted, one of which is eliminated.
* compiler: new peephole cases: if over d reg.Kaz Kylheku2021-03-021-0/+6
| | | | | | | | | | During peephole optimization, new cases can arise where a dreg is tested. * share/txr/stdlib/optimize.tl (basic-blocks peephole-block): Test for (if dreg ...) and (ifq dreg nil), eliminating the instruction or rewriting to jmp. Could be worth it to do another thread and dead code elimination.
* compiler: fix use of nil instead of (t 0).Kaz Kylheku2021-03-022-2/+2
| | | | | | | | | | | | Though the assembler acceps nil as a synonym of (t 0), we have to be consistent. * share/txr/stdlib/compiler.tl (compiler comp-or): Don't use nil operand in ifq instruction: use (t 0). * share/txr/stdlib/optimize.tl (basic-blocks thread-jumps-block): Don't match ifq pattern with nil operand; match (t 0).
* compiler: unmatchable pattern in jump threading.Kaz Kylheku2021-03-021-1/+1
| | | | | | * share/txr/stdlib/optimize.tl (basic-blocks thread-jumps-block): The two accidental occurrences of @reg will never match each other: rename one to @dn.
* compiler: reorder optimization passes.Kaz Kylheku2021-03-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | * share/txr/stdlib/compiler.tl (compiler optimize): Perform the control flow optimizations of jump threading and dead code elimination first. Then, do the data-flow-assisted optimizations on the re-calculated control flow graph. With this, two more useless insructions are shaved off the pattern matching Ackermann: (defun ack (:match) ((0 @n) (+ n 1)) ((@m 0) (ack (- m 1) 1)) ((@m @n) (ack (- m 1) (ack m (- n 1))))) It now compiles down to 16 instructions instead of 18; and the speedup of (ack 3 7) versus interpreted goes from 36.0 to 37.3 times. The reason that the new reduction in the code is possible is that previously, the code being analyzed was in separate basic blocks, which are now merged together in the dead code elimination pass. * share/txr/stdlib/compiler.tl (compiler optimize): Move calc-liveness and peephole after elim-dead-code.
* compiler: join blocks after dead code elimination.Kaz Kylheku2021-03-021-4/+33
| | | | | | | | | | | | | | | | | | After eliminating dead code and useless forward jumps, there can be blocks which unconditionally proceed to subsequent blocks, which have no other predecessor. These blocks can be merged together. This currently does nothing to alter the generated code. The advantage will be obvious in a subsequent commit. * share/txr/stdlib/optimize.tl (struct basic-block): New slot, rlinks: reverse links. (basic-blocks join-block): New method. (basic-blocks link-graph): Populate rlinks slots. (basic-blocks join-blocks): New method. (basic-blocks elim-dead-code): Reset rlinks also before re-calculating the graph connectivity with link-graph. Call join-blocks to merge together consecutive blocks.