| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Thanks to a recent discussion in the GNU Make mailing list
about some issue related to grouped patterns, I hit upon the
realization that the rule y.tab.h y.tab.c: parser.y has
a common stem, y, which can be exploited to write a pattern
rule. This is only active in maintainer mode; in user mode,
the y.tab.h and y.tab.c are separately produced from .shipped
files.
* Makefile (y.tab.h): Eliminate rule which detects a removed
y.tab.h.
(y.tab.c): Delete rule, replacing with new rule.
(%.tab.c %.tab.h): New pattern rule, which groups the targets
together.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The float type promotes to double when passed as
a variadic argument.
This patch adds internal FFI types which models that
promotion. It uses double for its C type, while still
performing the range checks for float.
Also, the types be-float and le-float are rejected from
being variadic arguments.
* share/txr/stdlib/ffi.tl (analyze-argtypes): Rewrite
function, adding validation and substitution for the
variadic part of the argument type list. Map float type to
double, and reject be-float and le-float.
* txr.1: Documented.
|
|
|
|
|
|
|
|
|
| |
* ffi.c (ffi_ushort_get): Add the missing align_sw_get macro
that checks for and handles misalignment.
Commit e539fbc2af3bd4a5bd4ca3d9b567862005509aa1 made on
2017-06-05 added these macros and fixes to numerous functions.
The commit message claims that ffi_ushort_get was among the
functions fixed, but no change was made to the function.
|
|
|
|
| |
* ffi.c (ffi_uchar_put): Fix for C90 compat and consistency.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a jmp instruction is replaced by the end instruction
that it jumps to, or possibly by a two-instruction sequence
ending in end, there can be more opportunities to optimize.
For instance, the second reduction in a sequence like this:
mov t3, t5 mov t3, t5
jmp label --> end t3 --> end t5
...
label:
end t3
* share/txr/stdlib/optimize.tl (basic-blocks peephole-block):
If the end-propagation is done, we change the linkage of the
current block to indicate that it has no next blocks. We
add it to the rescan list and set the recalc flag so the
liveness information is updated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a function has nothing but parameters that are not captured
in lexical closures, they can be converted registers. The
function then doesn't need a variable frame for its
parameters. This is similar to the eliminate-frame
optimization, and borrows the same code and logic.
* share/txr/stdlib/compiler.tl (compiler eliminate-frame): We
no longer assume that the code coming in starts with a frame
instruction we can eliminate using (cdr code) and an end
insruction we can eliminate with a trailing pattern.
This is because when this function is used for a lambda, this
is not the case; a lambda's variable frame is implicit,
created by the VM for any lambda with a nonzero frame size,
rather than by a frame instruction.
(compiler comp-let): In the call to eliminate-frame, we now
trim away the first and last instruction, to get rid of
the (frame ...) and (end ...).
(compiler comp-lambda-impl): Install a closure spy against the
variable frame to detect which variables are captured in
closures, similarly to in comp-let. Under the right
conditions, pass the code through eliminate-frame to turn
the variables into registers. The close instruction has to be
rewritten, because the frame size is now zero, and the number
of t registers has changed.
|
|
|
|
|
|
|
| |
* sysif.c (poll_wrap): Use seq_iter for efficience when
poll_list is a vector or other generalized sequence.
* txr.1: Change wording to say that poll-list is a sequence.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have the situation that there are effectively two kinds of
spies: let constructs plant spies only in order to learn about
what variables are being captured, whereas lambdas plant spies
in order to intercept variable accesses (and to inform the
spies that are interested in what is captured). Let us split
these up into two separate types, with different methods,
in different stacks.
* share/txr/stdlib/compiler.tl (struct var-spy): Renamed to
closure-spy.
(var-spy accessed, var-spy assigned): Methods removed: the
closure-spy type has only the captured method.
(struct capture-var-spy): Renamed to accesss-spy.
(capture-var-spy var-spies): Renamed to access-spy
closure-spies.
(capture-var-spy captured): Method removed; the access spy is
doesn't receive captured calls.
(struct compiler): Slot var-spies removed, replaced with
closure-spies and access-spies.
(with-var-spy): Macro removed.
(with-spy): New function.
(with-closure-spy, with-access-spy): New macros.
(compiler push-var-spy, compiler pop-var-spy): Methods
removed.
(compiler push-closure-spy, compiler pop-closure-spy,
compiler push-access-spy, compiler pop-access-spy): New
methods.
(compiler comp-var, compiler comp-setq, compiler
comp-lisp1-setq, compiler comp-lisp1-value): Walk new
access-spies list rather than var-spies to report about
accesses and assignments.
(compiler comp-let): Use with-closure-spy macro rather than
with var-spy. The spy object is now a closure-spy type,
and the variable is cspy rather than vspy.
(compiler comp-lambda-impl): Use with-access-spy instead of
with-var-spy. The spy object is now of type access-spy.
It refers to the current me.closure-spies from the compiler.
|
|
|
|
|
|
|
|
|
| |
I'm giving up and just using a recursive make invocation
for the "retest" target.
* Makefile (retest): Depend on nothing. Remove the tst
directory and invoke make tests recursively.
(tests.clean): Target removed.
|
|
|
|
|
|
|
|
|
|
|
| |
The var-spy structure is only being used for detecting
captured variables, not assigned or accessed variables.
So the information about accessed and assigned is being
wastefully accumulated, never used.
* share/txr/stdlib/compiler.tl (struct var-spy): Remove slots
acc-vars and set-vars.
(var-spy accessed, var-spy assigned): Methods become no-ops.
|
|
|
|
|
|
|
| |
* share/txr/stdlib/optimize.tl (basic-blocks elim-dead-code):
Ironically, we have a dead variable named unreachable in
this function. Let's remove it and the build form which
calculates it.
|
|
|
|
|
| |
* txr.1: Fix wrong word; clarify what is first form and second
form.
|
|
|
|
|
|
|
|
| |
* txr.1: structural pattern matching intro is divided into
sections, rearranged and reworded, with additional detail,
especially in regard to variables. Also, "a the" typo fixed in
list pattern notation section. Pattern matching is now taken
out of beta; the paragraph about the beta status is removed.
|
|
|
|
|
|
|
|
|
| |
* txr.1: Add a missing section defining meta-symbols,
meta-numbers and meta-expressions, which explains the sys:var
and sys:expr shorthand. Also documented are corner cases
involving deceptive floating-point syntax.
Everywhere in the document, we normalize "meta-variable"
as "meta-symbol".
|
|
|
|
|
|
|
| |
* lib.c (cat_str): Traverse sequences of strings efficiently
using seq_iter framework.
* txr.1: Document.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* arith.c (toint): Use self instead of repeating function name
in diagnostic.
* ftw.c (ftw_wrap): Likewise.
* glib.c (glow_wrap): Likewise.
* rand.c (make_random_state, random): Likewise.
* lib.c (nreverse, reverse, remove_if, int_str, less, chr_str,
chr_str_set, unintern, rehome_sym, in): Likewise.
(vector, obj_print_impl): Pass self to helper function instead
of repeating identical literal.
|
|
|
|
| |
* txr.1: Rewrote Macros section intro paragraphs.
|
|
|
|
|
| |
* txr.1: The description of command-get-buf mistakenly refers
to command-get.
|
|
|
|
|
|
| |
* lib.c (cat_str_measure): Use the self parameter in
diagnostics rather than cat-str, so errors are reported
against the correct function.
|
|
|
|
|
|
|
|
|
|
| |
* RELNOTES: Updated.
* configure, txr.1: Bumped version and date.
* share/txr/stdlib/ver.tl: Bumped.
* txr.vim, tl.vim: Regenerated.
|
|
|
|
|
|
| |
* share/txr/stdlib/optimize.tl (basic-blocks peephole-blocks):
Extend dead reg mov pattern to also handle the getlx, getv,
getf and getfb instructions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the result of a function call is not used, the call can
be eliminated if the function has no effects.
Effect-free functions are superset of constant-foldable
functions. Not all effect-free functions are const-foldable
because in some cases, creating an object at run-time is a
documented semantics which cannot be constant-folded. For
instance (list 1) cannot be constant folded, because it may be
relied upon to generate a fresh object each time it is called.
However, bringing a new list to life is not an effect. If
the value is not used, we can safely eliminate it.
The same reasoning applies to (gensym "abc"). It must generate
a unique symbol each time, and so cannot be constant-folded.
But call to gensym whose value is not used can be eliminated.
* share/txr/stdlib/compiler.tl (%effect-free-funs%): List of
side-effect-free functions registered in eval.c.
(%effect-free%): Hash of effect-free functions, incorporating
%const-foldable% also.
* share/txr/stdlib/optimize.tl (basic-blocks peephole-block):
Refer to %effect-free% rather than %const-foldable%.
|
|
|
|
|
| |
* share/txr/stdlib/compiler.tl (%const-foldable-funs%): Add
rest, nilf, tf, join, join-with, empty.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this patch, we optimize away closures that are not used.
Unused closures that have been hoisted to the top level are
not affected. We look for close instructions which produce
a dead treg, and rewrite these to jmp instructions to the same
label. When this happend, we set a flag for a dead code
elimination pass to be done again, to actually remove the now
unreachable closure code.
* share/txr/stdlib/optimize.tl (struct basic-blocks): New
slot, reelim, indicating that dead code elimination is to be
done again after peephole since the control flow graph has
changed.
(basic-blocks peephole-block): New pattern for eliminating a
dead register, targeting the close instruction.
(basic-blocks peephole): After peephole, check whether the
reelim flag has been set and do elim-dead-code again.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main idea in this patch is to identify calls functions
whose values are not used and which have no side effects. For
this purpose, we borrow the same set of functions that we
use as targets for constant folding: those in the compiler's
%const-foldable% hash.
A call is dead if it is a gcall or gapply instruction which
calls one of these functions, and the destination register is
dead.
To maximize the opportunities for this elimination, whenever
we eliminate such an instruction, we mark the block for
re-scanning, and we re-calculate the liveness info for that
block and then globally.
* share/txr/stdlib/compiler.tl (struct compiler): New slots,
datavec and symvec.
(compiler get-datavec): Cache the calculated datavec in the
new datavec slot.
(compiler get-symvec): Cache the calculated symvec in the
new symvec slot.
(compiler optimize): Pass the symvec to the basic-blocks BOA
constructor; it is now needed for identifying functions
that are referenced by symvec index number in the gcall and
gapply instructions.
* share/txr/stdlib/optimize.tl (struct basic-blocks): New
symvec slot, added to the the BOA parameter list also.
New slot recalc, indicating re-calculation of liveness is
needed.
(basic-blocks cut-block): Use pushnew to put bl onto the
rescan list, to prevent duplicates. Also push the new block
onto the rescan list.
(basic-blocks local-liveness): This method can now be called
again to re-calculate local liveness, so we must reset the
live slot to nil.
(basic-blocks calc-liveness): Take an optional list of blocks
to scan for local liveness, defaulting to all of them.
(basic-blocks peephole-block): Factor out some register
liveness tests to local functions. Add a pattern targetting
gcall and gapply instructions, testing for a dead register
from a call to a %const-foldable% function. Use pushnew
everywhere to add to the rescan list. Set the recalc flag when
the liveness-based reductions are applied.
(basic-blocks peephole): If there are blocks to be scanned
again, then if the recalc flag is set, recalculate local
liveness for all the blocks to be re-scanned and re-do global
liveness.
|
|
|
|
|
| |
* lib.c (sort): Return the copied and sorted object object,
not the original.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
That old cat-str function is often a pain, requiring the
pieces as a list. We have a sys:fmt-join that is undocumented.
That functions is now exposed as usr:join, and documented.
Also introducing join-with that takes a separator as the
leftmost argument. Thus (op join-with "::") gives us
a function that joins pieces with :: in between.
* eval.c (eval_init): Regiser fmt_join function under join
symbol also. Register join-with.
* lib.c (join_with): New function.
(fmt_join): Now a wrapper for join_with that passes a nil
separator.
* lib.h (join_with): Declared.
* share/txr/stdlib/optimize.tl (basic-blocks join-blocks):
Rename the local function join, which now triggers a warning
about a standard function being redefined.
* txr.1: Redocumented cat-str, and documented join-with
and join.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* lisplib.c (compiler_set_entries): Register *opt-level*
symbol for auto-loading.
* share/txr/stdlib/compiler.tl (*opt-level*): New special
variable.
(compiler comp-let): Eliminate frames only at level 3.
(compiler comp-lambda-impl): Lift load time at level 3.
(compiler comp-arith-form): Constant-folding only at lvl 1.
(compiler comp-fun-form): Algebraic substitutions and
reductions and constant-folding only at level 1.
(compiler comp-apply-call): Constant folding at level 1.
(compiler optimize): Optimizations off if level zero.
Thread jumps and eliminate dead code at level 2.
Flow-analysis based optimizations at level 3.
Additional optimizations at level 4.
(compile comp-block): Block elimination at level 3.
(compile-toplevel): Rebind *opt-level*, giving it value zero
if it is previously nil.
* share/txr/stdlib/optimize.tl (basic-blocks get-insns): Just
retrieve the instructions, letting caller decide whether to
call late-peephole or not.
* txr.1: Documented *opt-level*.
|
|
|
|
|
|
|
|
|
|
| |
* RELNOTES: Updated.
* configure, txr.1: Bumped version and date.
* share/txr/stdlib/ver.tl: Bumped.
* txr.vim, tl.vim: Regenerated.
|
|
|
|
|
|
| |
hash.c (equal_hash): Do not multiply by the seed if it is
zero; substitute 1. Otherwise under the default seed of zero,
the hash becomes zero.
|
|
|
|
|
|
|
|
| |
* share/txr/stdlib/compiler.tl (compiler comp-fun-form):
Rearrange the logic so that we only try the speculative
compilation when the three main conditions are right,
not before. This drastically reduces the number of times
we need to take the compiler snapshot.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This issue affects the original code which lifts lambdas to
load-time, as well as the new, recently added code for
similarly lifting functional combinator expressions.
The problem is that the trick works by compiling an expression
twice. The result of the first compile is thrown away in the
case when we compile it again in the load-time context.
But compiling has a side effect: the expression itself
may have an embedded load-time-liftable expression, which gets
deposited into the load-time fragment list. Thus garbage ends
up in the list of load-time fragments.
We likely want to save and restore other things, like
allocated D regisers.
* share/txr/stdlib/compiler.tl (compiler shapshot, compiler
restore): New methods.
(comp-lambda-impl, comp-fun): Save a snapshot of the compiler state
before doing the speculative compilation. If we don't use that
compilation, we restore the state from the snapshot.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* share/txr/stdlib/compiler.tl (compile-in-toplevel): This
macro must not save and restore the nlev variable of the compiler.
The nlev variable is a maximum: it measures the maximum
environment depth, establishing the depth of the display
needed when executing the code of the resulting VM
description. By saving and restoring this variable around the
compilation of a load-time form, we cause the load-time form's
contribution to the maximum to be ignored. If the load-time
form perpetrates nesting that is deeper than other code, it
will execute with an under-sized display, causing an assertion
crash or corruption.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The int_flo function also has a sensitivity to locale because
the bignum handling is text-based, involving printing a
floating-point value, and then assuming it contains a period
decimal separator.
* arith.c (int_flo): If CONFIG_LOCALE_TOLERANCE is enabled,
look for dec_point rather than '.' in the formatted number.
When matching and destructuring the number with sscanf,
don't look for the '.' character, but rather a complemented
character set which can maching nothing but the separator,
whatever that is.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The wcstod function has locale-specific behavior. It uses a
locale-specific decimal separator character, which may not be
the period. Though in TXR we never call setlocale in order to
activate localization, it is a good idea to put in code to
defend against this. If locale is ever unleashed on the code,
it really botches our floating-point handling. However, let's
keep that defensive logic disabled for now using the
preprocessor.
The strategy is to inquire about the locale's decimal
character at startup. Then, in the flo_str function, we
preprocess the input by converting the decimal period to
the locale-specific character before calling wcstod. On the
output side, we also deal with it in the format function; we
call sprintf, and then convert the locale-specific characer
to period.
I tested all this by temporarily introducing the setlocale
call, and switching to a locale with a comma as the separator,
geting make tests to pass, and doing some interactive testing.
This is not going to receive coverage in the test suite.
* lib.c (dec_point): New global variable.
(flo_str): If dec_point isn't '.', we must copy the
string into a local buffer, which we get from alloca, and
edit any '.' that it contains to dec_point.
(locale_init): New function; initializes dec_point.
(init): Call locale_init.
* lib.h (dec_point): Declared.
* stream.c (formatv): Don't look for a '.' in the result of
printing a double using %e; look for dec_point. Post-process
floating-point sprinf by substituting occurrences of dec_point
with the period.
|
|
|
|
|
|
|
|
|
|
|
| |
* share/txr/stdlib/optimize.tl (late-peephole): Add a
6-instruction pattern which is seen in pattern-matching code
in optimize.tlo and compiler.tlo. This can be tidied up a
little bit, reducing the instructions to 5, eliminating
redundant comparisons and threading a jump. At this point we
are probably working too hard for too little gain. More
effort in other areas like common subexpression elimination
could produce bigger wins.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I noticed a lot of this in compiled code:
(if reg label1)
label2
(jmp label3)
label1
by inverting the test, this can be shortened to
(ifq reg (t 0) label2)
label1
We must keep label1 in case it is the target of other
branches. Also, we only perform this if label2 is unreachable:
the (jmp label3) instruction is reached only when the previous
if branch is not taken.
* share/txr/stdlib/optimize.tl (basic-blocks get-insns): Call
new late-peephole method.
(basic-blocks late-peephole): New method, incorporating the
above pattern.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The uslot and umethod functions produce functions; and should
be lifted to load-time, if possible.
For instance, the .foo expression [mapcar .foo list]
translates to a (uslot 'foo) function call. This references no
variables, and so is liftable to load-time.
The umethod function is an extension of uslot that allows
partially applied method arguments to be carried. If those
arguments are all functional, the umethod call is liftable.
* share/txr/stdlib/compiler.tl (%functional-funs%): Include
umethod and uslot.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because of the previous optimization, some load-time forms
that appear in the library are unnecessary.
* share/txr/stdlib/optimize.tl (basic-blocks
merge-jump-thunks): Remove load-time around functional
combinators.
* share/txr/stdlib/socket.tl (sys:in6addr-condensed-text):
Remove one load-time that is now unnecessary, and one around
an op which was unnecessary even at the time it was written,
since the lambda is lifted without it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea behind this optimization is that certain expressions
that only calculate functions can be hoisted to load time.
These expressions meet these criteria:
1. Are not already in a top-level or load-time context.
2. Are function calls to a standard library
functional operator like chain andf, orf, juxt, ...
3. Do not access any variables.
3. Do not access any functions other than public (usr package)
global functions in the standard library.
An example of such an expression might be:
[chain cdr [iff symbolp list]]
If such an expression is embedded in a function, we don't want
the function to re-calculate it every time, which requires
time and generates garbage. We can transform it to the
equivalent of:
(load-time [chain cdr [iff symbolp list]])
to have it calculated once.
* share/txr/stdlib/compiler.tl (%functional-funs%,
%functional%): New global variables.
(compiler comp-fun-form): After compiling the function call,
check for the conditions for lifting. If so, compile the form
again as a load-time literal. The logic is similar to how
lambdas are lifted to load-time, though the conditions are
different.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've noticed that some duplicate short blocks are generated,
which look like this. Often they are consecutive, which
is why they are noticeable. These can become one block:
mov t17 d3
jmp label17
mov t17 d3
jmp label17
mov t17 d3
jmp label17
We identify identical blocks by looking for short instruction
sequences that end in an unconditional jmp, and then we group
duplicates in a hash table keyed on the instruction sequence.
We must ignore the label: the first instruction in each block
is a unique label.
We have to be careful about which ones to delete. Any block
that is entered from the top must be preserved. When these
blocks are identified, at least one block must remain that
is removable for the optimization to be able to do anything.
If all blocks are removable, we pick a leader which is
preserved. Otherwise we pick a leader from the preserved
blocks. The non-preserved blocks are deleted, and all jumps
to them from other blocks are redirected to jumps to the
leader.
* share/txr/stdlib/compiler.tl (optimize): Call
merge-jump-thunks as the last pass.
* share/txr/stdlib/optimize.tl (basic-blocks
merge-jump-thunks): New method.
|
|
|
|
|
|
| |
* share/txr/stdlib/optimize.tl (basic-block print): Print
basic blocks such that related blocks are printed by their
label rather than the whole graph.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a specific instruction sequence that often occurs
when pattern matching code is compiled. It is not easy
to attack by the existing optimization framework. Once the
code is divided into basic blocks, it spans multiple blocks.
It's not a good fit for either jump threading or
intra-basic-block peephhole reductions.
It is exemplified by the following disassembly snippet, where
the pattern is shown in the box:
5: 3C00000C ifq v00000 d1 12 --------+
6: 08000401 |
7: 20020010 gcall t16 0 v00001 d2 |
8: 08010000 |
9: 00000402 |
+---------------------------+ |
|10: 2C060403 movsr t6 d3 | |
|11: 3400000D jmp 13 | |
|12: 2C060000 movsr t6 nil | <--------+
|13: 3C000028 ifq t6 nil 40 |
+---------------------------+
14: 00060000
15: 3C000019 ifq v00001 d1 25
The d3 register holds t. Of course, it's always a different d
register in the various occurrences of this pattern.
The register t6 also varies, but it's the same register in
all three places in the pattern.
The purpose of this "kernel" of four instructions is:
- if it is entered from the top, set the the t6 register to t
and jump to 40.
- if it is entered at instruction 12, set the t6 register to
nil, and fall through to the next block.
This can be instead expressed by to three instructions:
+---------------------------+ |
|(mov t6 d3) | |
|(jmp 40) | |
|(mov t6 nil) | <--------+
+---------------------------+
This revised code then more conductive to further
optimizations in ways that the original is not. In some
cases, like the Ackermann test case, both moves into t6 will
disappear entirely.
* share/txr/stdlib/optimize.tl (basic-blocks :postinit): Pass
instructions through early-peephole.
(early-peephole): New function, where we can look for and
rewrite instruction patterns over the entire block of code,
before any division into basic blocks. Here, we look for the
four-instruction kernel and rewrite it.
|
|
|
|
|
|
|
|
|
|
| |
* share/txr/stdlib/compiler.tl (compiler comp-if): An invalid
or expression means we are not matching the if over equal
strength reduction pattern, and miscompiling an unrelated
expression (luckily, one that is unlikely to occur in any
reasonable application). Unfortunately, this correction spoils
the way the pattern matching Ackermann test case is being
compiled.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a function ends like this:
jmp label
...
label: end t42
The jmp can just be replaced by "end t42".
* share/txr/stdlib/optimize.tl (basic-blocks peephole-block):
Implement the pattern to replace a jump to a one or two
instruction sequence ending in a jend with that sequence.
Including the extra instruction helps if the target has not
been peepholed yet. In the pattern matching Ackermann case,
two instructions are promoted, one of which is eliminated.
|
|
|
|
|
|
|
|
|
|
| |
During peephole optimization, new cases can arise where a dreg
is tested.
* share/txr/stdlib/optimize.tl (basic-blocks peephole-block):
Test for (if dreg ...) and (ifq dreg nil), eliminating the
instruction or rewriting to jmp. Could be worth it to do
another thread and dead code elimination.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Though the assembler acceps nil as a synonym of (t 0), we have
to be consistent.
* share/txr/stdlib/compiler.tl (compiler comp-or): Don't use
nil operand in ifq instruction: use (t 0).
* share/txr/stdlib/optimize.tl (basic-blocks
thread-jumps-block): Don't match ifq pattern with nil operand;
match (t 0).
|
|
|
|
|
|
| |
* share/txr/stdlib/optimize.tl (basic-blocks
thread-jumps-block): The two accidental occurrences of @reg
will never match each other: rename one to @dn.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* share/txr/stdlib/compiler.tl (compiler optimize): Perform
the control flow optimizations of jump threading and dead code
elimination first. Then, do the data-flow-assisted
optimizations on the re-calculated control flow graph.
With this, two more useless insructions are shaved off the
pattern matching Ackermann:
(defun ack (:match)
((0 @n) (+ n 1))
((@m 0) (ack (- m 1) 1))
((@m @n) (ack (- m 1) (ack m (- n 1)))))
It now compiles down to 16 instructions instead of 18;
and the speedup of (ack 3 7) versus interpreted goes
from 36.0 to 37.3 times.
The reason that the new reduction in the code is possible
is that previously, the code being analyzed was in separate
basic blocks, which are now merged together in the dead code
elimination pass.
* share/txr/stdlib/compiler.tl (compiler optimize): Move
calc-liveness and peephole after elim-dead-code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After eliminating dead code and useless forward jumps, there
can be blocks which unconditionally proceed to subsequent
blocks, which have no other predecessor. These blocks can be
merged together. This currently does nothing to alter the
generated code. The advantage will be obvious in a subsequent
commit.
* share/txr/stdlib/optimize.tl (struct basic-block): New slot,
rlinks: reverse links.
(basic-blocks join-block): New method.
(basic-blocks link-graph): Populate rlinks slots.
(basic-blocks join-blocks): New method.
(basic-blocks elim-dead-code): Reset rlinks also before
re-calculating the graph connectivity with link-graph.
Call join-blocks to merge together consecutive blocks.
|