summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorKaz Kylheku <kaz@kylheku.com>2021-03-02 22:18:13 -0800
committerKaz Kylheku <kaz@kylheku.com>2021-03-02 22:18:13 -0800
commit25887472e9d60d256a0420de93f8ca97209c4367 (patch)
tree5c4060b29afa6fd0898ef896950c597cb8e1f58b
parent62e18de0a5c3572890bc73ea834b72b6851e3de1 (diff)
downloadtxr-25887472e9d60d256a0420de93f8ca97209c4367.tar.gz
txr-25887472e9d60d256a0420de93f8ca97209c4367.tar.bz2
txr-25887472e9d60d256a0420de93f8ca97209c4367.zip
compiler: optimize annoying instruction sequence.
There is a specific instruction sequence that often occurs when pattern matching code is compiled. It is not easy to attack by the existing optimization framework. Once the code is divided into basic blocks, it spans multiple blocks. It's not a good fit for either jump threading or intra-basic-block peephhole reductions. It is exemplified by the following disassembly snippet, where the pattern is shown in the box: 5: 3C00000C ifq v00000 d1 12 --------+ 6: 08000401 | 7: 20020010 gcall t16 0 v00001 d2 | 8: 08010000 | 9: 00000402 | +---------------------------+ | |10: 2C060403 movsr t6 d3 | | |11: 3400000D jmp 13 | | |12: 2C060000 movsr t6 nil | <--------+ |13: 3C000028 ifq t6 nil 40 | +---------------------------+ 14: 00060000 15: 3C000019 ifq v00001 d1 25 The d3 register holds t. Of course, it's always a different d register in the various occurrences of this pattern. The register t6 also varies, but it's the same register in all three places in the pattern. The purpose of this "kernel" of four instructions is: - if it is entered from the top, set the the t6 register to t and jump to 40. - if it is entered at instruction 12, set the t6 register to nil, and fall through to the next block. This can be instead expressed by to three instructions: +---------------------------+ | |(mov t6 d3) | | |(jmp 40) | | |(mov t6 nil) | <--------+ +---------------------------+ This revised code then more conductive to further optimizations in ways that the original is not. In some cases, like the Ackermann test case, both moves into t6 will disappear entirely. * share/txr/stdlib/optimize.tl (basic-blocks :postinit): Pass instructions through early-peephole. (early-peephole): New function, where we can look for and rewrite instruction patterns over the entire block of code, before any division into basic blocks. Here, we look for the four-instruction kernel and rewrite it.
-rw-r--r--share/txr/stdlib/optimize.tl19
1 files changed, 18 insertions, 1 deletions
diff --git a/share/txr/stdlib/optimize.tl b/share/txr/stdlib/optimize.tl
index 73aa6dc6..99faad8a 100644
--- a/share/txr/stdlib/optimize.tl
+++ b/share/txr/stdlib/optimize.tl
@@ -51,7 +51,7 @@
uwprot catch block jend))
(:postinit (bb)
- (let* ((insns (dedup-labels (cons bb.start bb.insns)))
+ (let* ((insns (early-peephole (dedup-labels (cons bb.start bb.insns))))
(cuts (merge [where symbolp insns]
[where [andf consp
(op memq (car @1) bb.jump-ops)]
@@ -474,3 +474,20 @@
(cons label0 rest))
(@else tail))
insns)
+
+(defun early-peephole (code)
+ (rewrite-case insns code
+ (((mov (t @t1) (d @d1))
+ (jmp @lab2)
+ @(symbolp @lab1)
+ (mov (t @t1) (t 0))
+ @lab2
+ (ifq (t @t1) (t 0) @lab3)
+ . @rest)
+ ^((mov (t ,t1) (d ,d1))
+ (jmp ,lab3)
+ ,lab1
+ (mov (t ,t1) (t 0))
+ ,lab2
+ ,*rest))
+ (@else else)))