| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
* RELNOTES: Updated.
* configure, txr.1: Bumped version and date.
* share/txr/stdlib/ver.tl: Bumped.
* txr.vim, tl.vim: Regenerated.
|
|
|
|
|
|
| |
* share/txr/stdlib/optimize.tl (basic-blocks peephole-blocks):
Extend dead reg mov pattern to also handle the getlx, getv,
getf and getfb instructions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the result of a function call is not used, the call can
be eliminated if the function has no effects.
Effect-free functions are superset of constant-foldable
functions. Not all effect-free functions are const-foldable
because in some cases, creating an object at run-time is a
documented semantics which cannot be constant-folded. For
instance (list 1) cannot be constant folded, because it may be
relied upon to generate a fresh object each time it is called.
However, bringing a new list to life is not an effect. If
the value is not used, we can safely eliminate it.
The same reasoning applies to (gensym "abc"). It must generate
a unique symbol each time, and so cannot be constant-folded.
But call to gensym whose value is not used can be eliminated.
* share/txr/stdlib/compiler.tl (%effect-free-funs%): List of
side-effect-free functions registered in eval.c.
(%effect-free%): Hash of effect-free functions, incorporating
%const-foldable% also.
* share/txr/stdlib/optimize.tl (basic-blocks peephole-block):
Refer to %effect-free% rather than %const-foldable%.
|
|
|
|
|
| |
* share/txr/stdlib/compiler.tl (%const-foldable-funs%): Add
rest, nilf, tf, join, join-with, empty.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this patch, we optimize away closures that are not used.
Unused closures that have been hoisted to the top level are
not affected. We look for close instructions which produce
a dead treg, and rewrite these to jmp instructions to the same
label. When this happend, we set a flag for a dead code
elimination pass to be done again, to actually remove the now
unreachable closure code.
* share/txr/stdlib/optimize.tl (struct basic-blocks): New
slot, reelim, indicating that dead code elimination is to be
done again after peephole since the control flow graph has
changed.
(basic-blocks peephole-block): New pattern for eliminating a
dead register, targeting the close instruction.
(basic-blocks peephole): After peephole, check whether the
reelim flag has been set and do elim-dead-code again.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main idea in this patch is to identify calls functions
whose values are not used and which have no side effects. For
this purpose, we borrow the same set of functions that we
use as targets for constant folding: those in the compiler's
%const-foldable% hash.
A call is dead if it is a gcall or gapply instruction which
calls one of these functions, and the destination register is
dead.
To maximize the opportunities for this elimination, whenever
we eliminate such an instruction, we mark the block for
re-scanning, and we re-calculate the liveness info for that
block and then globally.
* share/txr/stdlib/compiler.tl (struct compiler): New slots,
datavec and symvec.
(compiler get-datavec): Cache the calculated datavec in the
new datavec slot.
(compiler get-symvec): Cache the calculated symvec in the
new symvec slot.
(compiler optimize): Pass the symvec to the basic-blocks BOA
constructor; it is now needed for identifying functions
that are referenced by symvec index number in the gcall and
gapply instructions.
* share/txr/stdlib/optimize.tl (struct basic-blocks): New
symvec slot, added to the the BOA parameter list also.
New slot recalc, indicating re-calculation of liveness is
needed.
(basic-blocks cut-block): Use pushnew to put bl onto the
rescan list, to prevent duplicates. Also push the new block
onto the rescan list.
(basic-blocks local-liveness): This method can now be called
again to re-calculate local liveness, so we must reset the
live slot to nil.
(basic-blocks calc-liveness): Take an optional list of blocks
to scan for local liveness, defaulting to all of them.
(basic-blocks peephole-block): Factor out some register
liveness tests to local functions. Add a pattern targetting
gcall and gapply instructions, testing for a dead register
from a call to a %const-foldable% function. Use pushnew
everywhere to add to the rescan list. Set the recalc flag when
the liveness-based reductions are applied.
(basic-blocks peephole): If there are blocks to be scanned
again, then if the recalc flag is set, recalculate local
liveness for all the blocks to be re-scanned and re-do global
liveness.
|
|
|
|
|
| |
* lib.c (sort): Return the copied and sorted object object,
not the original.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
That old cat-str function is often a pain, requiring the
pieces as a list. We have a sys:fmt-join that is undocumented.
That functions is now exposed as usr:join, and documented.
Also introducing join-with that takes a separator as the
leftmost argument. Thus (op join-with "::") gives us
a function that joins pieces with :: in between.
* eval.c (eval_init): Regiser fmt_join function under join
symbol also. Register join-with.
* lib.c (join_with): New function.
(fmt_join): Now a wrapper for join_with that passes a nil
separator.
* lib.h (join_with): Declared.
* share/txr/stdlib/optimize.tl (basic-blocks join-blocks):
Rename the local function join, which now triggers a warning
about a standard function being redefined.
* txr.1: Redocumented cat-str, and documented join-with
and join.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* lisplib.c (compiler_set_entries): Register *opt-level*
symbol for auto-loading.
* share/txr/stdlib/compiler.tl (*opt-level*): New special
variable.
(compiler comp-let): Eliminate frames only at level 3.
(compiler comp-lambda-impl): Lift load time at level 3.
(compiler comp-arith-form): Constant-folding only at lvl 1.
(compiler comp-fun-form): Algebraic substitutions and
reductions and constant-folding only at level 1.
(compiler comp-apply-call): Constant folding at level 1.
(compiler optimize): Optimizations off if level zero.
Thread jumps and eliminate dead code at level 2.
Flow-analysis based optimizations at level 3.
Additional optimizations at level 4.
(compile comp-block): Block elimination at level 3.
(compile-toplevel): Rebind *opt-level*, giving it value zero
if it is previously nil.
* share/txr/stdlib/optimize.tl (basic-blocks get-insns): Just
retrieve the instructions, letting caller decide whether to
call late-peephole or not.
* txr.1: Documented *opt-level*.
|
|
|
|
|
|
|
|
|
|
| |
* RELNOTES: Updated.
* configure, txr.1: Bumped version and date.
* share/txr/stdlib/ver.tl: Bumped.
* txr.vim, tl.vim: Regenerated.
|
|
|
|
|
|
| |
hash.c (equal_hash): Do not multiply by the seed if it is
zero; substitute 1. Otherwise under the default seed of zero,
the hash becomes zero.
|
|
|
|
|
|
|
|
| |
* share/txr/stdlib/compiler.tl (compiler comp-fun-form):
Rearrange the logic so that we only try the speculative
compilation when the three main conditions are right,
not before. This drastically reduces the number of times
we need to take the compiler snapshot.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This issue affects the original code which lifts lambdas to
load-time, as well as the new, recently added code for
similarly lifting functional combinator expressions.
The problem is that the trick works by compiling an expression
twice. The result of the first compile is thrown away in the
case when we compile it again in the load-time context.
But compiling has a side effect: the expression itself
may have an embedded load-time-liftable expression, which gets
deposited into the load-time fragment list. Thus garbage ends
up in the list of load-time fragments.
We likely want to save and restore other things, like
allocated D regisers.
* share/txr/stdlib/compiler.tl (compiler shapshot, compiler
restore): New methods.
(comp-lambda-impl, comp-fun): Save a snapshot of the compiler state
before doing the speculative compilation. If we don't use that
compilation, we restore the state from the snapshot.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* share/txr/stdlib/compiler.tl (compile-in-toplevel): This
macro must not save and restore the nlev variable of the compiler.
The nlev variable is a maximum: it measures the maximum
environment depth, establishing the depth of the display
needed when executing the code of the resulting VM
description. By saving and restoring this variable around the
compilation of a load-time form, we cause the load-time form's
contribution to the maximum to be ignored. If the load-time
form perpetrates nesting that is deeper than other code, it
will execute with an under-sized display, causing an assertion
crash or corruption.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The int_flo function also has a sensitivity to locale because
the bignum handling is text-based, involving printing a
floating-point value, and then assuming it contains a period
decimal separator.
* arith.c (int_flo): If CONFIG_LOCALE_TOLERANCE is enabled,
look for dec_point rather than '.' in the formatted number.
When matching and destructuring the number with sscanf,
don't look for the '.' character, but rather a complemented
character set which can maching nothing but the separator,
whatever that is.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The wcstod function has locale-specific behavior. It uses a
locale-specific decimal separator character, which may not be
the period. Though in TXR we never call setlocale in order to
activate localization, it is a good idea to put in code to
defend against this. If locale is ever unleashed on the code,
it really botches our floating-point handling. However, let's
keep that defensive logic disabled for now using the
preprocessor.
The strategy is to inquire about the locale's decimal
character at startup. Then, in the flo_str function, we
preprocess the input by converting the decimal period to
the locale-specific character before calling wcstod. On the
output side, we also deal with it in the format function; we
call sprintf, and then convert the locale-specific characer
to period.
I tested all this by temporarily introducing the setlocale
call, and switching to a locale with a comma as the separator,
geting make tests to pass, and doing some interactive testing.
This is not going to receive coverage in the test suite.
* lib.c (dec_point): New global variable.
(flo_str): If dec_point isn't '.', we must copy the
string into a local buffer, which we get from alloca, and
edit any '.' that it contains to dec_point.
(locale_init): New function; initializes dec_point.
(init): Call locale_init.
* lib.h (dec_point): Declared.
* stream.c (formatv): Don't look for a '.' in the result of
printing a double using %e; look for dec_point. Post-process
floating-point sprinf by substituting occurrences of dec_point
with the period.
|
|
|
|
|
|
|
|
|
|
|
| |
* share/txr/stdlib/optimize.tl (late-peephole): Add a
6-instruction pattern which is seen in pattern-matching code
in optimize.tlo and compiler.tlo. This can be tidied up a
little bit, reducing the instructions to 5, eliminating
redundant comparisons and threading a jump. At this point we
are probably working too hard for too little gain. More
effort in other areas like common subexpression elimination
could produce bigger wins.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I noticed a lot of this in compiled code:
(if reg label1)
label2
(jmp label3)
label1
by inverting the test, this can be shortened to
(ifq reg (t 0) label2)
label1
We must keep label1 in case it is the target of other
branches. Also, we only perform this if label2 is unreachable:
the (jmp label3) instruction is reached only when the previous
if branch is not taken.
* share/txr/stdlib/optimize.tl (basic-blocks get-insns): Call
new late-peephole method.
(basic-blocks late-peephole): New method, incorporating the
above pattern.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The uslot and umethod functions produce functions; and should
be lifted to load-time, if possible.
For instance, the .foo expression [mapcar .foo list]
translates to a (uslot 'foo) function call. This references no
variables, and so is liftable to load-time.
The umethod function is an extension of uslot that allows
partially applied method arguments to be carried. If those
arguments are all functional, the umethod call is liftable.
* share/txr/stdlib/compiler.tl (%functional-funs%): Include
umethod and uslot.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because of the previous optimization, some load-time forms
that appear in the library are unnecessary.
* share/txr/stdlib/optimize.tl (basic-blocks
merge-jump-thunks): Remove load-time around functional
combinators.
* share/txr/stdlib/socket.tl (sys:in6addr-condensed-text):
Remove one load-time that is now unnecessary, and one around
an op which was unnecessary even at the time it was written,
since the lambda is lifted without it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea behind this optimization is that certain expressions
that only calculate functions can be hoisted to load time.
These expressions meet these criteria:
1. Are not already in a top-level or load-time context.
2. Are function calls to a standard library
functional operator like chain andf, orf, juxt, ...
3. Do not access any variables.
3. Do not access any functions other than public (usr package)
global functions in the standard library.
An example of such an expression might be:
[chain cdr [iff symbolp list]]
If such an expression is embedded in a function, we don't want
the function to re-calculate it every time, which requires
time and generates garbage. We can transform it to the
equivalent of:
(load-time [chain cdr [iff symbolp list]])
to have it calculated once.
* share/txr/stdlib/compiler.tl (%functional-funs%,
%functional%): New global variables.
(compiler comp-fun-form): After compiling the function call,
check for the conditions for lifting. If so, compile the form
again as a load-time literal. The logic is similar to how
lambdas are lifted to load-time, though the conditions are
different.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've noticed that some duplicate short blocks are generated,
which look like this. Often they are consecutive, which
is why they are noticeable. These can become one block:
mov t17 d3
jmp label17
mov t17 d3
jmp label17
mov t17 d3
jmp label17
We identify identical blocks by looking for short instruction
sequences that end in an unconditional jmp, and then we group
duplicates in a hash table keyed on the instruction sequence.
We must ignore the label: the first instruction in each block
is a unique label.
We have to be careful about which ones to delete. Any block
that is entered from the top must be preserved. When these
blocks are identified, at least one block must remain that
is removable for the optimization to be able to do anything.
If all blocks are removable, we pick a leader which is
preserved. Otherwise we pick a leader from the preserved
blocks. The non-preserved blocks are deleted, and all jumps
to them from other blocks are redirected to jumps to the
leader.
* share/txr/stdlib/compiler.tl (optimize): Call
merge-jump-thunks as the last pass.
* share/txr/stdlib/optimize.tl (basic-blocks
merge-jump-thunks): New method.
|
|
|
|
|
|
| |
* share/txr/stdlib/optimize.tl (basic-block print): Print
basic blocks such that related blocks are printed by their
label rather than the whole graph.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a specific instruction sequence that often occurs
when pattern matching code is compiled. It is not easy
to attack by the existing optimization framework. Once the
code is divided into basic blocks, it spans multiple blocks.
It's not a good fit for either jump threading or
intra-basic-block peephhole reductions.
It is exemplified by the following disassembly snippet, where
the pattern is shown in the box:
5: 3C00000C ifq v00000 d1 12 --------+
6: 08000401 |
7: 20020010 gcall t16 0 v00001 d2 |
8: 08010000 |
9: 00000402 |
+---------------------------+ |
|10: 2C060403 movsr t6 d3 | |
|11: 3400000D jmp 13 | |
|12: 2C060000 movsr t6 nil | <--------+
|13: 3C000028 ifq t6 nil 40 |
+---------------------------+
14: 00060000
15: 3C000019 ifq v00001 d1 25
The d3 register holds t. Of course, it's always a different d
register in the various occurrences of this pattern.
The register t6 also varies, but it's the same register in
all three places in the pattern.
The purpose of this "kernel" of four instructions is:
- if it is entered from the top, set the the t6 register to t
and jump to 40.
- if it is entered at instruction 12, set the t6 register to
nil, and fall through to the next block.
This can be instead expressed by to three instructions:
+---------------------------+ |
|(mov t6 d3) | |
|(jmp 40) | |
|(mov t6 nil) | <--------+
+---------------------------+
This revised code then more conductive to further
optimizations in ways that the original is not. In some
cases, like the Ackermann test case, both moves into t6 will
disappear entirely.
* share/txr/stdlib/optimize.tl (basic-blocks :postinit): Pass
instructions through early-peephole.
(early-peephole): New function, where we can look for and
rewrite instruction patterns over the entire block of code,
before any division into basic blocks. Here, we look for the
four-instruction kernel and rewrite it.
|
|
|
|
|
|
|
|
|
|
| |
* share/txr/stdlib/compiler.tl (compiler comp-if): An invalid
or expression means we are not matching the if over equal
strength reduction pattern, and miscompiling an unrelated
expression (luckily, one that is unlikely to occur in any
reasonable application). Unfortunately, this correction spoils
the way the pattern matching Ackermann test case is being
compiled.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a function ends like this:
jmp label
...
label: end t42
The jmp can just be replaced by "end t42".
* share/txr/stdlib/optimize.tl (basic-blocks peephole-block):
Implement the pattern to replace a jump to a one or two
instruction sequence ending in a jend with that sequence.
Including the extra instruction helps if the target has not
been peepholed yet. In the pattern matching Ackermann case,
two instructions are promoted, one of which is eliminated.
|
|
|
|
|
|
|
|
|
|
| |
During peephole optimization, new cases can arise where a dreg
is tested.
* share/txr/stdlib/optimize.tl (basic-blocks peephole-block):
Test for (if dreg ...) and (ifq dreg nil), eliminating the
instruction or rewriting to jmp. Could be worth it to do
another thread and dead code elimination.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Though the assembler acceps nil as a synonym of (t 0), we have
to be consistent.
* share/txr/stdlib/compiler.tl (compiler comp-or): Don't use
nil operand in ifq instruction: use (t 0).
* share/txr/stdlib/optimize.tl (basic-blocks
thread-jumps-block): Don't match ifq pattern with nil operand;
match (t 0).
|
|
|
|
|
|
| |
* share/txr/stdlib/optimize.tl (basic-blocks
thread-jumps-block): The two accidental occurrences of @reg
will never match each other: rename one to @dn.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* share/txr/stdlib/compiler.tl (compiler optimize): Perform
the control flow optimizations of jump threading and dead code
elimination first. Then, do the data-flow-assisted
optimizations on the re-calculated control flow graph.
With this, two more useless insructions are shaved off the
pattern matching Ackermann:
(defun ack (:match)
((0 @n) (+ n 1))
((@m 0) (ack (- m 1) 1))
((@m @n) (ack (- m 1) (ack m (- n 1)))))
It now compiles down to 16 instructions instead of 18;
and the speedup of (ack 3 7) versus interpreted goes
from 36.0 to 37.3 times.
The reason that the new reduction in the code is possible
is that previously, the code being analyzed was in separate
basic blocks, which are now merged together in the dead code
elimination pass.
* share/txr/stdlib/compiler.tl (compiler optimize): Move
calc-liveness and peephole after elim-dead-code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After eliminating dead code and useless forward jumps, there
can be blocks which unconditionally proceed to subsequent
blocks, which have no other predecessor. These blocks can be
merged together. This currently does nothing to alter the
generated code. The advantage will be obvious in a subsequent
commit.
* share/txr/stdlib/optimize.tl (struct basic-block): New slot,
rlinks: reverse links.
(basic-blocks join-block): New method.
(basic-blocks link-graph): Populate rlinks slots.
(basic-blocks join-blocks): New method.
(basic-blocks elim-dead-code): Reset rlinks also before
re-calculating the graph connectivity with link-graph.
Call join-blocks to merge together consecutive blocks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The optimizer relies too much on labels. The basic block graph
is linked via labels instead of directly, and the blocks
are maintained in order via a label list. Let's get rid
of this.
* share/txr/stdlib/optimize.tl (struct basic-blocks): Member
slot removed. Oh look, we have a list slot that is not
utilized at all; that will be used instead.
(basic-blocks :postinit): Don't calculate the bb.labels.
(basic-blocks get-insns): Loop over list instead of labels,
and get the .insns directly without label hash lookup.
(basic-blocks (cut-block, next-block)): Operate with and
return blocks instead of labels.
(basic-blocks link-graph): Link graph using direct pointers
rather than indirection through labels.
(basic-blocks calc-liveness): Get rid of small amount of label
indirection here.
(basic-blocks thread-jumps-block): Drop unused label argument.
Do some needed label hash lookups now to produce block
argument for cut-block and next-block methods.
(basic-blocks peephole-block): Several rules need to do some
hash lookup to work with cut-block and next-block, and
retrieve a needed label from a block.
(basic-blocks (peephole, thread-jumps)): Loops simplified, no
dealing with labels.
(basic-blocks elim-next-jump): Drop label argument,
use bl in call to next-block.
(basic-blocks elim-dead-code): Loops and recursion simplified
without label indirection.
|
|
|
|
|
|
|
| |
* configure (yacc_is_newer_bison): Configuration variable
removed.
(gen_config_make): Generation of yacc_is_newer_bison make
variable removed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Over two years ago, I disabled parallel building with a
.NOTPARALLEL: directive in the Makefile, due to some rules
that depended on order of updates of their prerequisites.
Today the situation is better. It looks like there are just a
few rules that have to be fixed, all in the area of executing
tests.
Let us not enable that by default, though, to protect
downstream users from themselves. Some operating system
distribution builds optimistically run make -j N on all the
packages, and then have to deal with sporadic breakages when
some of those packages have parallel build bugs. We can
prevent TXR from ever being implicated in such a breakage by
turning off parallel building even if make -j requests it.
There will now be a ./configure --parallelmake option
to allow parallel builds, off by default. The --maintainer
configuration mode will flip that default; configuring in
maintainer mode will enable parallel builds, unless explicitly
disabled with --no-parallelmake.
* Makefile (.NOTPARALLEL): Conditionally assert, if
parallelmake is false.
(tst/tests/014/dgram-stream.ok,
tst/tests/014/socket-basic.ok): Enforce order between these
two tests. They clash in their use of network ports so cannot
run at the same time.
(test.clean): test.clean must finish executing before tests;
enforce order.
* configure (parallelmake, parallelmake_given): New variables.
(help text): Add parallelmake option to help.
(gen_config_make): Generate the parallelmake make variable
into config.make.
New script section, defaulting parallelmake to y in maintainer
mode.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* RELNOTES: Updated.
* configure, txr.1: Bumped version and date.
* share/txr/stdlib/ver.tl: Bumped.
* txr.vim, tl.vim: Regenerated.
* protsym.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We cannot assume that a d register is has a non-nil value.
This is because d registers are exploited in the
implementation of load-time: the result of a load-time form is
stored by mutating a d register, and the value could be nil.
Since we still want to be able to assume that d registers
are non-nil, what we can do is just avoid that assumption for
those d regisers that are used for load-time values.
* share/txr/stdlib/compiler.tl (struct compiler): When
constructing basic-blocks, pass a new constructor argument:
the list of load-time d-regs. This is easily obtained by
mapping the load-time frags to their oreg slots, which are
those d-regs.
* share/txr/stdlib/optimize.tl (struct basic-blocks): New
slot and BOA constructor argument, lt-dregs.
(basic-blocks thread-jumps-block): Add a require to the
pattern (if (d @reg) @jlabel), that the register must not
be one of the load-time d-regs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* share/txr/stdlib/copy-file.tl (package copy-file): Removed.
I cannot remember why I used this, but it was probably because
this source file was developed in "user space". Then it likely
became an experiment in the use of merge-delete-package to
define material in a temporary package which is merged down to
the ultimate target package like sys. Why get rid of this?
It's causing a problem. The second and subsequent recompile of
copy-path-rec causes its internal block not to be optimized
away. I spotted this while checking whether a recompile of the
library using the fully compiled library is different from
the initial compile: copy-path-rec's block is not optimized away.
|
|
|
|
|
|
| |
* share/txr/stdlib/compiler.tl (compiler comp-setqf): When an
assignment to a function is compiled, we must register the
occurrence of a free function, not a free variable.
|
|
|
|
|
|
|
|
|
| |
* share/txr/stdlib/optimize.tl (basic-blocks elim-next-jump):
New method, which detects that the basic block ends with a
(jmp label), where label is the next block in emit order.
This jmp instruction can be eliminated.
(basic-blocks elim-dead-code): Walk the reachable labels,
and call elim-next-jmp to remove useless jumps.
|
|
|
|
|
|
|
|
|
| |
* share/txr/stdlib/optimize.tl (basic-blocks
thread-jumps-block): This is a complementary optimization to
the one which matches (if (d @reg) @jlabel). An if
instruction conditional on the nil register (t 0) is always
taken, and can be rewritten to a jmp. This can promote
elimination of dead code.
|
|
|
|
|
|
|
|
|
|
|
| |
* share/txr/stdlib/compiler.tl (compiler optimize): Call new
elim-dead-code method on basic-blocks object.
* share/txr/stdlib/optimize.tl (basic-blocks elim-dead-code):
New method. We reset the links information for each basic
block and re-build the graph. Then we traverse it to determine
what blocks are reachable, and cull the original blocks list
of those that are not.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The register removal optimization now works for registers
initialized from variables. For instance
mov t13 v00000
mov t12 v00001
gcall t7 3 t13 d0
gcall t8 4 t12 t19
can, under the right conditions, be replaced with
gcall t7 3 v00000 d0
gcall t8 4 v00001 t19
where the useless copies of the v0 and v1 registers to
t13 and t12 are gone, and the refernces to those registers
are renamed to those v registers.
* share/txr/stdlib/optimize.tl (basic-blocks local-liveness):
We now use the def field of a live-info structure differently.
The def field holds a register, instead of a t register
number. For any instruction that defines a register, the
register is stored, whether it is v or t register.
(basic-blocks peephole-block): The rule is simplified and
extended to cover t-regs that receive a v-reg. A special
additional condition applies in this case: v-registers are
tied to a frame, and must not be accessed if their frame has ended.
Our optimization carries the risk that the variable was copied
into a register precisely for the purpose that the v register
is going out of scope, and the value is needed outside of the
frame. We don't have frame level information in the basic
block, so to be conservative, we guess like this: if any end
instructions occur (potentially ending a frame), and if the
register is used after an end instruction, then we don't do
the replacement.
* share/txr/stdlib/compiler.tl (compiler comp-catch): Remove
inappropriate jend. Control definitely flows past this
instruction, because we jump here in the case when no
exception has taken place to continue the program. This bug
has resulted in some blocks being declared unreachable by the
control flow graph construction, and thus not included in
register liveness calculations, resulting in having a nil
in the live slot. This was exposed by the new new
optimization.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The list-builder methods, other than del, del* and get,
now return the object instead of nil.
* share/txr/stdlib/build.tl (list-builder (add, add*, pend,
pend*, ncon, ncon*): Return the object, self.
(list-builder-flets): Do not return the object out of the
local functions which invoke the above methods.
* txr.1: Documented.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using liveness information, if we are very careful about the
circumstances, we can can eliminate instructions of the form
mov tN src
and replace every subsequent occurrence of tN in the basic
block by src. For instance, simple case: if a function
ends with
mov t13 d5
end t13
that can be rewriten as
end d5
The most important condition is that t13 is not live on exit
from that basic block. There are other conditions. For now,
one of the conditions is that src cannot be a v register.
* share/txr/stdlib/optimize.tl (struct live-info): New slot,
def. This indicates which t register is being clobbered, if
any, by the instruction to which this info is attached.
(basic-blocks local-liveness): Adjust the propagation of the
defined info. If an instruction both consumes a register and
overwrites it, we track that as both a use and a definition.
We set up the def fields of live-info. We do that by mutation,
so we must be careful to copy the structure. The def field
pertains to just one instruction, but the same info can be
attached to multiple instructions.
(subst-preserve): New function.
(basic-blocks peephole-block): New optimization added.
Now takes a basic-block argument, bl.
(basic-blocks peephole): Pass bl to peephole-block.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The optimizer now calculates t liveness information for the t
registers. In every basic block, it now knows which t regs
are live on exit, and which are used in the block, at every
instruction.
One small optimization is based on this so far: the removal
of a move instruction targeting a dead register. This appears
stable.
* share/txr/stdlib/compiler.tl (compiler comp-unwind-protect):
The protected code of a uwprot must terminate with a regular
end instruction, rather than the jend pseudo-instruction.
This is because the clean-up block is executed after the
protected block and references values generated in it: t
registers are live between he pfrag and the cfrag. Without
this, the compile-file-conditionally function was wrongly
optimized, causing it to return false due to the setting of
the success flag (that having been moved into a t register)
having being optimized away.
(compiler optimize): Add the call the basic-blocks method
to calculate liveness.
* share/txr/stdlib/optimize.tl (struct live-info, struct
basic-block): New structure types. The basic-block
structure type now representes basic blocks instead of raw
lists.
(struct basic-blocks): New slots, root, li-hash.
(basic-blocks jump-ops): We add few instructions that
reference labels, just to be safe.
(basic-blocks :postinit): Refactor division into basic blocks
so that it generates basic-block objects instead of just lists
of instructions. Also, the new method link-graph is called
which analyzes the tail instructions of all the blocks to
determine connectivity and sets the next and links fields
of the objects to build a graph.
(basic-blocks (get-insns, cut-blocks)): Refactor for struct
represenation of basic blocks.
(basic-blocks (link-graph, local-liveness, calc-liveness): New
methods.
(basic-blocks thread-jumps-block): Refactor for struct
representation of basic blocks.
(basic-blocks peephole-blocks): Likewise, and new pattern for
removing moves into dead t-registers, assisted by liveness
information.
(basic-blocks (peephole, thread-jumps)): Refactor for
basic-blocks representation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compiling a form like
(caseq op ((a b c d e f g h i j k) 42)))
Results in a run-time error in the compiler, similar to:
list-vec: (#:l0048) is not of type vec
* share/txr/stdlib/compiler.tl (compiler comp-switch): Make
sure cases is also still a vector in the complex case when
it's not just a copy of cases-vec.
|
|
|
|
|
|
|
|
| |
* mpi/mpi.c (mp_or, mp_xor): The main loop must only iterate
up to the minimum number of digits between the two source
operands, not the maximum number of digits, because otherwise
it will access past the end of the shorter bignum's
digit array.
|
|
|
|
|
|
|
| |
* share/txr/stdlib/optimize.tl (thread-jumps-block): Add
missing argument to close instruction pattern. This causes us
to miss a threading opportunity due to the new ntregs
parameter being mistaken for a label, which is not found.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the bug of not allowing @(next :list nil).
Also the bug of @(next :string) or @(next :string nil)
reporting a nonsensical error message, instead of correctly
complaining that nil isn't a string.
* match.c (v_next_impl): Capture the conses returned
by assoc, so we know whether a keyword is present. Then use
those conses in tests which detect whether the keyword is
present, rather than the values, which could be nil.
|
|
|
|
| |
* match.c (match_files): Fix bungled wording.
|