txr - TXR: A data munging language.

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	compiler: compress symbol tables also.	Kaz Kylheku	2023-07-26	2	-31/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When functions are optimized away due to constant folding, instead of replacing them with a nil, we now compact the table to close the gaps and renumber the references in the code. * stdlib/compiler.tl (compiler null-stab): Method removed. (compiler compact-dregs): Renamed to compact-dregs-and-syms. Now compacts the symbol table also. This is combined with D-reg compacting because it makes just two passes through the instruction: a pass to identify the used D registers and symbol indices, and then another pass to edit the instructions with the renamed D registers and renumbered symbol indices. (compiler optimize): Remove the call to the null-unused-data on the basic-blocks object; nulling out D regs and symbol table entries is no longer required. Fllow the rename of compact-dregs to compact-dregs-and-syms which is called the same way otherwise. * stdlib/optimize.tl (basic-blocks null-unused-data): No longer used method removed.
*	compiler: compact D registers.	Kaz Kylheku	2023-07-25	2	-16/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We now have some constant folding in the optimizer too, not just in the front end compiler pass. This is leaving behind dead D registers that are not referenced in the code. Let's compact the D register table to close the gap. * stdlib/compiler.tl (compiler get-dreg): In this function we no longer check that we have allocated too many D registers. We let the counter blow past %lev-size%. Because this creates the fighting chance that the compaction of D regs will reduce their number to %lev-size% or less. By doing this, we allow code to be compilable that otherwise would not be: code that allocates too many D regs which are then optimized away. (compiler compact-dregs): New function. Does all the work. (compiler optimize): Compact the D regs at optimization level 5 or higher. (compile-toplevel): Check for an overflowing D reg count here, after optimization. * stdlib/optimize.tl (basic-blocks null-unused-data): Here, we no longer have to do anything with the D registers.
*	compiler: code formatting.	Kaz Kylheku	2023-07-25	2	-4/+4
\| \| \| \| \| \| \| \|	* stdlib/compiler.tl (compiler get-dreg): Fix indentation proble. * stdlib/optimize.tl (basic-block fill-treg-compacting-map): Likewise.
*	hash: out of bound array access in hash-iter-peek.	Kaz Kylheku	2023-07-25	1	-2/+2
\| \| \| \| \| \| \|	* hash.c (hash_iter_peek): The loop here must be a top-of-test while loop, not a bottom-test do loop. In the chained hashing implementation, this was a do loop, but it also had a test with a break for the index.
*	rel-path: treat empty paths as relative.	Kaz Kylheku	2023-07-25	2	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	* stdlib/path-test.tl (path-volume): Don't return :abs for a path whose empty first component isn't followed by any more items. Otherwise we return :abs for a path formed by splitting the empty string, and then calls like (rel-path "" "a") complain about a mixture of absolute and relative. With this change, empty paths given to rel-path behave as if they were ".". * tests/018/rel-path.tl: New test cases.
*	del/replace with index-list: fix semantics.	Kaz Kylheku	2023-07-18	5	-22/+257
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit does two things. The replace function, implemented under the hood by four specializations: replace-list, replace-vec, replace-str and replace-buf, will handle the index-list case a little differently. This is needed to fix the ability of the del macro work on place designated by an index list, such as: (del [sequence '(1 3 5 6)] which now deletes elements 1, 3, 5 and 6 from the sequence, and returns a sequence of those items. The underlying implementation uses replace with an index-list, which is now capable of deleting items. Previously, replace would stop processing the index list when the replacement-sequence corresponding to the index list ran out of items. Now, when the replacement-sequence runs out of items, the remaining index-list sequence elements specify items to be deleted. For instance if str holds "abcdefg" then: (set [str '(1 3 5)] "xy") will change str to "axcyeg". Elements 1 and 3 are replaced by x and y, respectively. Element 5, the letter f, is deleted, because the replacement "xy" has no element corresponding to 5. * lib.c (replace_list, replace_str, replace_vec): Implement new deleteion semantics for the case when the replacement sequence runs out of items. * buf.c (replace_buf): Likewise. * tests/010/seq.txr: Some new test cases here for deletion. * tests/010/seq.expected: Updated. * txr.1: Documented new semantics of replace, including a new restriction that if elements are being deleted, the indices should be monotonically increasing regardless of the type of the sequence (not only list). A value of 289 for the -C option documented, which restores the previous behavior of replace (breaking deletion by index-list, unfortunately: you don't always get to simulate an old version of TXR while using new features.)
*	doc: fix recent typos.	Kaz Kylheku	2023-07-18	1	-2/+2
\| \| \| \| \| \|	* txr.1: fix misspelling of Arithmetic and algorithm introduced by recent updates: new math functions, and documentation of CRC-32 parameters.
*	compiler: constant fold gapply like gcall.	Kaz Kylheku	2023-07-17	1	-3/+6
\| \| \| \| \| \| \|	* stdlib/optimize.tl (basic-blocks do-peephole-block): Extend constant-folding case to recognize gapply as well as gcall. We just have to take care in how we apply apply arguments to the actual function to get the value.
*	compiler: new apply-to-gapply optimization	Kaz Kylheku	2023-07-17	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Let's consider the DIM expression [a . b c]. Without this change we get: syms: 0: a 1: b 2: c code: 0: 98020000 getf t2 0 1: 98030001 getf t3 1 2: 98040002 getf t4 2 3: 1C020002 apply t2 t2 t3 t4 4: 00030002 5: 00000004 6: 10000002 end t2 With this change: syms: 0: a 1: b 2: c code: 0: 98030001 getf t3 1 1: 98040002 getf t4 2 2: 24020002 gapply t2 0 t3 t4 3: 00030000 4: 00000004 5: 10000002 end t2 There are 17 hits for this optimization in optimize.tl alone! * stdlib/optimize.tl (basic-blocks do-peephole-block): New pattern here. We recognize an instruction sequence which begins with a (getf treg idx) and ends in an (apply dest treg ...), without any instructions in between accessing or redefining treg. Additionally, the treg value must not be used after the apply, unless the apply redefines it. In that case, we rewrite this pattern to eliminate that initial getf instruction, and substitute (gapply dest idsx ..) for the apply.
*	Do not unnecessarily invalidate vm binding cache.	Kaz Kylheku	2023-07-17	1	-7/+18
\| \| \| \| \| \|	* eval.c (op_defsymacro, rt_defsymacro, makunbound, fmakunbound): Don't call vm_invalidate_binding if there is no binding for the symbol.
*	printer: print (sys:vector-list ()) as #() not #nil.	Kaz Kylheku	2023-07-17	1	-1/+4
\| \| \| \| \|	* lib.c (obj_print_impl): Check for this case and handle. The #nil syntax is not readable.
*	bug: compiled code keeps seeing var clobbered by symacro.	Kaz Kylheku	2023-07-17	2	-0/+6
\| \| \| \| \| \| \| \|	* eval.c (op_defsymacro, rt_defsymacro): We must call vm_invalidate_binding so the VM forgets a cached binding for this variable. * tests/019/redef.tl: Test added.
*	Bug exposed due to to environment changes.	Kaz Kylheku	2023-07-17	2	-1/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was a bug in rt_defun in that it was not calling vm_invalidate_binding. This mean that compiled functions were not picking up redefinitions. This bug is fixed now because rt_defun now calls sethash on the top_fb directly, which modifies the existing binding cell; it is not allocating a new cell. We put in new test cases to confirm the proper redefinition behaviors. The proper redefinition behavior exposes an issue in pattern matching. * tests/019/redef.tl: New file. * stdlib/match.tl (transform-quote): This function's compiled image, when deposited into a .tlo file, becomes incorrect because (sys:hash-lit) turns into #H() syntax, which reads back as something else. In other words (sys:hash-lit) deosn't have print-read consistency and so doesn't externalize. To fix this right we would need a print mode which ensures machine readability rather than human readability, like in Common Lisp. For now, we just break up the pattern so that it's not a literal match. This bug was hidden due to theredefinition issue. When match.tl is being compiled, it defines non-triv-pat-p twice. Due to redefinitions not kicking in properly, the first definition of non-triv-pat-p remains in effect for some functions. When transform-qquote is being expanded, the (sys:hash-lit) pattern is treated as non-trivial, even though it is is trivial, and so it is turned into pattern matching code. The code doesn't contain a (sys:hash-lit) literal and so the issue doesn't occur.
*	Simplify top-level macro environments also.	Kaz Kylheku	2023-07-17	2	-23/+16
\| \| \| \| \| \| \| \| \| \| \| \|	Like was done with the function and variable top-level environments, we simplify the macro ones. * eval.c (func_get_name, lookup_mac, lookup_symac, lookup_symac_lisp1, op_defsymacro, rt_defsymacro, rt_defmacro, op_defmacro, reg_mac, reg_symacro): Adjust to simpler representation where the hash cell itself is the binding cell, rather than holding a binding cell in its cdr.
*	Simplify top-level variable and function environments.	Kaz Kylheku	2023-07-16	4	-27/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since their inception, the top_fb and top_fb hashes associated symbols with bindings cells, causing an extra cons cell to be allocated for each entry. I don't remember why this is. It might have been that way so that gethash(top_fb, sym) would return a cell when the variable exists, or else nil. This was before we had functions like gethash_e and inhash that return the hash cell itself. A hash cell is also a cons and can serve as a binding just fine by itself.Let's make it so. For now, the macro and symbol macro environments stay the way they are; I will likely convert them also. * eval.c (env_fbind, env_vbind, lookup_global_var, lookup_sym_lisp1, lookup_fun, func_get_name, rt_defv, rt_defun, set_symbol_value, reg_fun, reg_varl): Update all these functions so they treat the hash cell from top_vb or top_fb as the binding cell, rather than putting or expecting the cdr of that cell (i.e the hash value) to be a binding cell. * hash.[ch] (gethash_d): New function. Jus gethash_e without the pesky self argument, that would only be needed for error reporting if we pass an object that isn't a hash. * stdlib/place.tl (sys:get-fun-getter-setter, sys:get-vb): These two functions must be adjusted since they refer to the top-fb and top-vb variables. sys:get-vb isn't used anywhere; it continues to exist in order to provide run-time support to files that were compiled with an buggy version of the symbol-value place.
*	fix self name of var defining run-time support function.	Kaz Kylheku	2023-07-16	1	-1/+1
\| \| \| \|	* eval.c (rt_defv): Report as sys:rt-defv, not sys:defv.
*	compiler: recognize T0 register (nil) as constant.	Kaz Kylheku	2023-07-15	1	-2/+4
\| \| \| \| \| \|	* stdlib/optimize.tl (basic-blocks do-peephole-block): The constant folding case should fire even if some of the arguments of the call aren't D registers but T0.
*	compiler: move material into constfun.tl	Kaz Kylheku	2023-07-15	3	-30/+32
\| \| \| \| \| \| \| \| \| \| \| \| \|	* stdlib/compiler.tl (%effect-free-funs%, %effect-free%, %functional-funs%, %functional%): Move variables into stdlib/constfun.tl * stdlib/constfun.tl %effect-free-funs%, %effect-free%, %functional-funs%, %functional%): Moved here. * stdlib/optimize.tl: Use load-for to express dependency on constfun module; don't depend on the compiler having loaded it.
*	compiler: constant folding in optimizer.	Kaz Kylheku	2023-07-15	3	-20/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The compiler handles trivial constant folding over the source code, as a source to source transformation. However, there are more opportunities for constant folding after data flow optimizations of the VM code. Early constant folding will not fold, for instance, (let ((a 2) (b 3)) (* a b)) but we can reduce this to an end instruction that returns the value of a D register that holds 6. Data flow optimizations will propagate the D registers for 2 and 3 into the gcall instruction. We can then recognize that we have a gcall with nothing but D register operands, calling a constant-foldable function. We can allocate a new D register to hold the result of that calculation and just move that D register's value into the target register of the original gcall. * stdlib/compiler.tl (compiler get-dreg): When allocating a new D reg, we must invalidate the datavec slot which is calculated from the data hash. This didn't matter before, because until now, get-datavec was called after compilation, at which point no new D regs will exist. That is changing; the optimizer can allocate D regs. (compiler null-dregs, compiler null-stab): New methods. (compiler optimize): Pass self to constructor for basic-blocks. basic-blocks now references back to the compiler. At optimization level 5 or higher, constant folding can now happen, so we call the new method in the optimizer to null the unused data. This overwrites unused D registers and unused parts of the symbol vector with nil. * stdlib/optimize (basic-blocks): Boa constructor now takes a new leftmost param, the compiler. (basic-blocks do-peephole-block): New optimization case: gcall instruction invoking const-foldable function, with all arguments being dregs. (basic-blocks null-unused-data): New method.
*	Math library: add numerous C99 functions.	Kaz Kylheku	2023-07-15	5	-115/+822
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* configure: Detect all the new functions, with separate tests for the unary and binary ones. * arith.c (cbrt_s, erf_s, erfc_s, exp10_s, exp2_s, expm1_s, gamma_s, j0_s, j1_s, lgamma_s, log1p_s, logb_s, nearbyint_s, rint_s, significand_s, tgamma_s, y0_s, y1_s, copysign_s, drem_s, fdim_s, fmax_s, fmin_s, hypot_s, jn_s, ldexp_s, nextafter_s, remainder_s, scalb_s, scalbln_s, yn_s, r_copysign_s, r_drem_s, r_fdim_s, r_fmax_s, r_fmin_s, hypot_s, r_jn_s, r_ldexp_s, r_nextafter_s, r_remainder_s, r_scalb_s, scalbln_s, r_yn_s): New symbol variables. (not_available): New static function. (cbrt_wrap, erf_wrap, erfc_wrap, exp10_wrap, exp2_wrap, expm1_wrap, gamma_wrap, j0_wrap, j1_wrap, lgamma_wrap, log1p_wrap, logb_wrap, nearbyint_wrap, rint_wrap, significand_wrap, tgamma_wrap, y0_wrap, y1_wrap, copysign_wrap, drem_wrap, fdim_wrap, fmax_wrap, fmin_wrap, hypot_wrap, jn_wrap, ldexp_wrap, nextafter_wrap, remainder_wrap, scalb_wrap, scalbln_wrap, yn_wrap): New static functions. (arith_set_entries, arith_instantiate): New static functions. (arith_init): Initialize symbols and instantiate functions via autoload mechanism. In a program that doesn't use the functions, we suffer only the overhead of interning the symbols. * lib.h (UNUSED): New macro for GCC unused attribute. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
*	lib: avoid intern for symbol we already have.	Kaz Kylheku	2023-07-13	1	-1/+1
\| \| \| \| \|	* eval.c (eval_init): We have a repeat_s variable; no need to call intern(lit("repeat"), user_package).
*	md5: bugfix: broken on big endian.	Kaz Kylheku	2023-07-11	1	-8/+8
\| \| \| \| \| \|	* chksums/md5.c (encode, decode): Fix these functions so they actually swap the byte order of 4 byte words on big endian. Tested on PPC64.
*	Fix diagnostics which call non-symbol a symbol.	Kaz Kylheku	2023-07-11	2	-1/+4
\| \| \| \| \| \| \| \| \| \|	* eval.c (me_load_for): An object which is not one of the valid clause symbols is not necessarily a symbol; don't call it one in the diagnostic. * stdlib/struct.tl (sys:check-slot): Similarly, an object that isn't the name of a struct slot isn't necessarily a symbol; don't call it one.
*	unique: use sequence iteration	Kaz Kylheku	2023-07-10	1	-20/+8
\| \| \| \| \|	* lib.c (unique): Use seq_iter_t rather than dividing into list and vector cases.
*	group-reduce: use sequence iteration.	Kaz Kylheku	2023-07-10	2	-25/+14
\| \| \| \| \| \| \| \|	* hash.c (group_reduce): Use seq_iter_t instead of obsolete vector and list iteration. * txr.1: Use 1..11 range in one group-reduce example instead of (range 1 10).
*	group-by: use sequence iteration.	Kaz Kylheku	2023-07-10	2	-17/+9
\| \| \| \| \| \| \| \| \| \| \|	* hash.c (group_by): Replace obsolete list/vector switching with seq_iter_t iteration. The group-by function is limited otherwise. * txr.1: Update group-by documentation not to assert that the sequence argument is required to be a list or vector. Examples for group-by and group-map now use a 0..11 range instead of (range 0 10).
*	chksum: generate more with TXR.	Kaz Kylheku	2023-07-09	2	-9/+64
\| \| \| \| \| \| \| \|	* genchksum.txr, chksum.c: The declarations of the symbol variables, COBJ class handles, and most of the chksum_init function is also generated now. A comment is added explaining what parts of the file are generated, and what parts are scraped, and giving some advice to the maintainer.
*	chksum: generate with TXR.	Kaz Kylheku	2023-07-09	2	-213/+787
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A giant macro that generates numerous functions and variables is hell for anyone who has to debug the code. The right approach is external code generation, where the result is a clean source file that is checked into the repository. * genchksum.txr: New file. * chksum.c: Now generated. (chksum_impl): Macro and its invocations removed. (sha1_stream_impl, sha1_stream, sha1_szmax_upd, sha1_buf, sha1_str, sha1, sha1_ops, sha1_begin, sha1_utf8_byte_callback, sha1_hash, sha1_end): These are directly defined again. (sha256_stream_impl, sha256_stream, sha256_szmax_upd, sha256_buf, sha256_str, sha256, sha256_ops, sha256_begin, sha256_utf8_byte_callback, sha256_hash, sha256_end): Likewise. generated by macro. md5_stream_impl, md5_stream, md5_szmax_upd, md5_buf, md5_str, md5, md5_ops, md5_begin, md5_utf8_byte_callback, md5_hash, md5_end): Likewise.
*	doc: document SHA-1 support.	Kaz Kylheku	2023-07-08	2	-8/+77
\| \| \| \| \| \|	* txr.1: SHA-1 functions documented. * stdlib/doc-syms.tl: Updated.
*	doc: fix SHA256.	Kaz Kylheku	2023-07-08	1	-3/+3
\| \| \| \|	* txr.1: SHA256 should be SHA-256 in several places.
*	doc: document crc32 parameters.	Kaz Kylheku	2023-07-08	1	-0/+9
\| \| \| \| \|	* txr.1: Specify which crc32 is implemented: which polynomial, bit direction, and complementation of the output.
*	Tests for checksum functions.	Kaz Kylheku	2023-07-08	1	-0/+40
\| \| \| \|	* tests/013/chksum.tl: New file.
*	Adding SHA-1 hash.	Kaz Kylheku	2023-07-08	7	-4/+403
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* chksums/sha1.c, chksums/sha1.h: New files. * LICENSE, METALICENSE: Mention new code. * chksum.c (sha1_ctx_s, sha1_ctx_cls): New static variables. (chksum_init): Register sha-ctx symbol, and sha_ctx_s COBJ class. Register sha1-stream, sha1, sha1-begin, sha1-hash and sha1-end intrinsics. (sha1_stream_impl, sha1_stream, sha1_szmax_upd, sha1_buf, sha1_str, sha1, sha1_ops, sha1_begin, sha1_utf8_byte_callback, sha1_hash, sha1_end): These functions and variables are generated by a call to the cksum_impl macro. * Makefile (OBJS): add chksums/sha1.o object file.
*	chksum: merge SHA256 and MD5 code with macro.	Kaz Kylheku	2023-07-08	1	-352/+211
\| \| \| \| \| \| \| \| \| \| \| \| \|	* chksum.c (chksum_impl): New macro. Two calls to this macro generate the implementations of SHA256 and MD5. The code is identical. (sha256_stream_impl, sha256_stream, sha256_szmax_upd, sha256_buf, sha256_str, sha256, sha256_ops, sha256_begin, sha256_utf8_byte_callback, sha256_hash, sha256_end): Directly written definitions removed, now generated by macro. md5_stream_impl, md5_stream, md5_szmax_upd, md5_buf, md5_str, md5, md5_ops, md5_begin, md5_utf8_byte_callback, md5_hash, md5_end): Likewise.
*	crypt: handle libxcrypt failure tokens.	Kaz Kylheku	2023-07-04	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some OS distros have switched to a library called libxcrypt, which, instead of returning null on failure, like the Glibc crypt, returns a valid "failure token" string starting with a * character, which is guaranteed to be different from the salt argument. We should check for this so that we handle errors uniformly. Users are reporting failing crypt tests that are coming up with "0" instead of throwing an exception. sysif.c (crypt_wrap): Only accept a non-null result if it isn't one of the two strings "0" and "1".
*	Version 289.txr-289	Kaz Kylheku	2023-07-02	7	-803/+853
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* RELNOTES: Updated. * configure (txr_ver): Bumped version. * stdlib/ver.tl (lib-version): Bumped. * txr.1: Bumped version and date. * txr.vim, tl.vim: Regenerated. * protsym.c: Regenerated.
*	Bug: ranges not treated as iterable in some situations.	Kaz Kylheku	2023-06-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	* lib.c (seq_kind_tab): Map the RNG type to SEQ_VECLIKE rather than SEQ_NOTSEQ. This one liner change causes ranges to be rejected for iteration needlessly. For instance, we can now do (take 42 "00".."99"). Note that ranges are not literally vector-like; they don't support the ref operation. The effect of this change is simply that they are not rejected due to being SEQ_NOTSEQ. The iteration code already handles RNG specially, so the SEQ_VECLIKE classification doesn't actually matter.
*	Callable integers become assignable places.	Kaz Kylheku	2023-06-30	3	-31/+101
\| \| \| \| \| \| \| \| \|	* lib.c (dwim_set): Handle seq argument being an integer or range. * tests/012/callable.tl: A few tests. * txr.1: Documented.
*	New: callable integers and ranges.	Kaz Kylheku	2023-06-28	4	-2/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* lib.c (do_generic_funcall): Allow integers and ranges to be function callable. They take one argument and index into it or extract a slice. In the case of ranges, this is a breaking change. Ranges can already be used in the function position in some limited ways that are not worth preserving. * tests/012/callable.tl: New file. * tests/012/iter.tl: Here we fix two instances of breakage. Using txr -C 288 will restore the behaviors previously tested here. * txr.1: Documented.
*	New cached sorting functions.	Kaz Kylheku	2023-06-28	5	-2/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These functions are useful when sorting a sequence using an expensive keyfun. * autoload.c (csort_set_entries, csort_instantiate): New static functions. (autlod_init): Register autoloading of csort module via new functions. * stdlib/csort.tl: New file. * tests/012/sort.tl: csort functions included in tests. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
*	New function: hash-map.	Kaz Kylheku	2023-06-28	6	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	hash-map converts a function mapping over a sequence into a hash table. * hash.[ch] (hash_map): New function. * tests/010/hash.tl: Test case. * genman.txr: The hash-map identifier introduces a hash collision. We have to deal with that somehow now. (colli): We put the conflicting entries into a new hash called colli which maps them to an increment value. (hash-title): Increment the hash code h by the amount indicated in colli, if the title is found there. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
*	equal: bug: broken equality substitution.	Kaz Kylheku	2023-06-28	2	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	* lib.c (equal): Several cases which react to the type of the left argument have a default path which wrongly short-circuits to an early return. All these cases must break through to the logic at the end of the function which tests the right side for a possible equality substitution. * tests/012/struct.tl: One breaking test cases added. equal was found to return nil for two structures that have equal lists as their equality substitute.
*	ssort: gc bug in vector case.	Kaz Kylheku	2023-06-28	3	-4/+6
\| \| \| \| \| \| \| \| \| \|	* gc.[ch] (gc_prot_array_alloc): Return the COBJ via new pointer argument. * lib.c (ssort_vec): Capture the object from gt_prot_array_alloc into a local variable, That makes it visible to the garbage collector, so it won't be prematurely reclaimed. Since we don't use or return that object, we need to use gc_hint.
*	eval: take macro environment.	Kaz Kylheku	2023-06-27	8	-152/+180
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this change we fix the bug that the debugger commands yield their Lisp forms rather than evaluating them. * eval.c (eval_intrinsic): Takes one more argument, the macro environment. This is passed into env_to_menv as the root macro environment. (eval_init): Update registration of eval intrinsic to have two optional arguments. * eval.h (eval_intrinsic): Declaration updated. * parser.c (read_file_common, read_eval_ret_last): Pass nil argument to new parameter of eval_intrinsic. (repl): Pass the env parameter as the new menv parameter of eval_intrinsic, rather than the existing env parameter. This fixes the command dispatch in the debugger, since the command table is consists of symbol macros, and not variables. For instance the backtrace command bt is a binding of the bt symbol to the form (sys:print-backtrace), which has to be substituted for it and executed. When that envrionment is used as the ordinary environment, bt looks like a variable whose value is the list (sys:backtrace). * parser.y (elem, check_parse_time_action): Fix eval_intrinsic calls. * txr.c (txr_main): Likewise. * txr.1: Documented. * y.tab.c.shipped: Updated. * stdlib/doc-syms.tl: Updated.
*	hash: rename some variables in remove algorithms	Kaz Kylheku	2023-06-22	1	-7/+7
\| \| \| \| \| \| \| \| \| \|	hash.c (hash_remove): In the algorithm that moves cells forward in the cluster in order to avoid the use of tombstones, we use slightly better variables. The cell we are looking to replace is called wipe rather than start. The cryptic varaible name k is renamed to iprobe, because it the initial probe location for the cell at position i.
*	hash: support existing mutation+iteration semantics.	Kaz Kylheku	2023-06-20	3	-15/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original chained hashing scheme makes certain guarantees in situation when a hash table that is being iterated is also undergoing insertions or deletions. The original scheme meets these requirements simply, by putting a freeze against hash table growth while there are outstanding iterations. Chained hashing gracefully handles load factors above 1. Load factors above 1 are not possible under open addressing (and we don't even want to approach 1) but we would like to preserve the requirements. The requirements are: 1. If an iterator has already visited an item, it will not see that item again, regardless of insertions which cause the table to be reorganized. 2. It is not specified/required that an iterator will visit newly inserted items. It may visit some of those items, but not others. 3. If an iterator has not yet visited an item, and that item is deleted, it will not see that item, regardless of any insertions that reorganize the table. In this commit, we implement a "table stack" scheme. 1. When a table is resized due to insertions, and it is being iterated (h->usecount > 0), in that situation it will push the existing small table onto a stack, the h->tblstack (table stack). 2. Iterators take a hash table's current table and its size, and work with that snapshot of the table. If the original hash table grows, existing iterators work with the original table as it existed just before the reorganization. So after that they do not see any new insertions. 3. Whenever the delete operation (hash_remove) finds the item and removes it from the current table, it also walks the table stack, searches for the item in every table in the stack and nulls it out. This search is oblivious to nil; it's a blind search that goes around the table starting at the first probe position, looking for the identical cons cell to take out. This algorithm ensures that iterators will not see a deleted item, unless they already visited it before the deletion, of course. * hash.h (struct hash_iter): New members table, and mask. * hash.c (struct hash): New member, tblstack. (hash_grow): We drop the vec argument and recreate it locally (not essential to this commit). If we find that the usecount is positive, we push the existing table onto the table stack. Otherwise, we take the opportunity to obliterate the table stack. (hash_insert): Drop the restriction that hash_grow is only called when the use count is zero. Adjust calls to hash_grow to drop the vec argument. (hash_remove): When an item is found and deleted, and the table is in use by iterators, walk the table stack and delete it from each previous table. Otherwise, if the table is not in use by iterators, obliterate the table stack. (hash_mark): Exit early also if there is a table stack, and mark that stack. (do_make_hash, make_similar_hash, copy_hash): Initialize table stack in new hash. (hash_iter_mark): Mark the iterator's table. This is likely not necessary since we also mark the hash table, which should have a pointer to that same table. That wouldn't be defensive programming, though. (hash_iter_init, us_hash_iter_init): Initialize table and mask. (hash_iter_next_impl, hash_iter_peek): These functions have to walk the table snapshot taken by the iterator, using the captured mask, and not the current table. (has_reset): If the target table's use count drops to zero, obliterate its table stack. We add a missing setcheck here; this operation potentially stores a different hash into an existing iterator. It's not being done safely with regard to generational GC. * tests/010/hash.tl: New tests.
*	hash: experimental switch to open addressing.	Kaz Kylheku	2023-06-20	2	-210/+227
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* hash.h (struct hash_iter): chain and cons members removed; index member added. * hash.c (struct hash_ops): acons_new_c_fun member removed. (hash_ops_init): Reduced to three arguments. (hash_grow): Function removed. (hash_find_slot, hash_lookup, hash_grow, hash_insert, hash_remove): New static functions. (hash_mark): Iterate the simple open table, which directly contains pointers to the hash entry cons cells, rather than lists of these. (hash_acons_new_c, hash_aconsql_new_c, hash_aconsq_new_c): Static functions removed. (hash_eq_ops, hash_eql_ops, hash_equal_ops): Initializer macro calls reduced to three arguments. (copy_hash_chain): Function removed. (copy_hash): copy_hash_chain logic altered to duplicate the hash cells like copy_hash_chain did. (gethash_c, gethash_e, remhash): Retargeted to the new static functions that do open addressing with linear probing. (hash_iter_mark): No cons member to mark any more. (hash_iter_init, us_hash_iter_init): Initialization updated to new representation. (hash_iter_next_impl, hash_iter_peek): Rewritten to traverse simple table. Loses second argument. (hash_iter_next, hash_next): Don't pass second argument to hash_iter_next_impl. (do_weak_tables): Adjusted to walk open table.
*	hash: bug: initial hash mask miscalculation.	Kaz Kylheku	2023-06-19	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hash tables are supposed to start with a mask of 255, which is the 256 modulus less one. However, due to a coding mistake, they get a modulus of 252. Now this still works. It just has a few zero bits in it: 1111 1100 rather than 11111111. When this grows by doubling in size, we get 1 1111 1001, 11 1111 0011 and so on. Now this still all works, because hash values AND-ed with such a mask are no greater than that mask. The table isn't accessed out of bounds. It's just inefficiently used. * hash.c (do_make_hash, make_similar_hash, clearhash): When converting the power-of-two modulus, which is a Lisp word, to the integer mask, subtract 1 from the integer value, not the original Lisp word.
*	quips: procrastination quip.	Kaz Kylheku	2023-06-17	1	-0/+1
\| \| \| \|	* stdlib/quips.tl (%quips%): New one.
*	New macro: match-cond.	Kaz Kylheku	2023-06-12	5	-1/+98
\| \| \| \| \| \| \| \| \| \| \| \| \|	* stdlib/match.tl (match-cond): New macro. * autoload.c (match_set_entries): match-cond triggers autoload of match module. * tests/011/patmatch.tl: Tests. * txr.1: Documented. * stdlib/doc.tl: Updated.