txr - TXR: A data munging language.

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	New feature: struct preludes.	Kaz Kylheku	2022-11-03	2	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A struct prelude definition associates one or more future defstruct (by struct name) with clauses which are implicitly inserted into the defstruct. It is purely a macro-time construct, customizing the expansion behavior of defstruct. * stdlib/struct.tl (struct-prelude, struct-prelude-alists): New special variables holding hash tables. (defstruct): Before processing slot-specs, augment it with the contents of the prelude definitions associated with this struct name. (define-struct-prelude): New macro. autoload.c (struct_set_entries): define-struct-prelude is interned and triggers autoload of struct module. * tests/012/oop-prelude.tl: New file. * tests/012/oop-prelude.expected: Likewise. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
*	crypt: remove dubious validator.	Kaz Kylheku	2022-10-31	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The validate_salt function was introduced in commit c3a0ceb2cea1a9d43f2baf5a2e63d0d712c8df19, February 2020. I cannot reproduce the internal crash in crypt which it alleges, and I neglected to mention the bad inputs in the commit or add tests. I'm not able to reproduce the alleged behavior in spite of trying all sorts of bad inputs; and looking at the crypt source in glibc, I don't see any obvious problem. And so, on this Halowe'en, we exorcise the ghost that has been haunting the crypt. * sysif.c (salt_char_p, validate_salt): Static functions removed. (crypt_wrap): Don't call validate_salt, and so cwsalt need not be tested for null. * tests/018/crypt.tl: New file. * txr.1: Mention that crypt_r is used if available, which avoids static storage.
*	cat-str/join/join-with: allow nested sequences	Kaz Kylheku	2022-10-25	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The measure/allocate/catenate functions which underlie the cat-str implementation are streamlined, simplifying the code. At the same time, they handle nested sequences of string/character items. * lib.c (struct cat_str): New member, seen_one. This flips from 0 to 1 after the first item has been seen in the cat_str_measure pass or cat_str_append pass. Each item other than the first is preceded by a separator. (cat_str_measure, cat_str_append): The more_p argument is dropped. We account for the separator with the help of the new seen_one flag, which allows us to easily recurse over items that are sequences. (cat_str_alloc): Reset the seen_one flag in preparation for the cat_str_append pass. (cat_str, vscat, scat2, scat3, join_with): Simplified. * tests/015/split.tl: New tests. * txr.1: Redocumented.
*	defstruct: new :inherit clause.	Kaz Kylheku	2022-10-17	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The :inherit clause allows custom struct clauses to inject inherited bases. * stdlib/struct.tl (defstruct): Recognize :inherit clause, adding symbol arguments to extra list of supers that get appended to the list coming from defstruct's seconda rgument. (define-struct-clause): Disallow :inherit clause name. * tests/012/oop-dsc.tl: New tests. * txr.1: Documented.
*	structs: optional init-exprs now useful in :delegate	Kaz Kylheku	2022-10-11	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \|	* stdlib/struct.tl (:delegate): Handle the two-element form of the optional parameter, which specifies the usual initializing expression for the default value. This is just passed through as-is to the generated method. Diagnose if the three-element form occurs. * tests/012/oop.tl: Some new tests. * txr.1: Documented.
*	Syntax: allow separator commas in numeric tokens.	Kaz Kylheku	2022-10-05	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* parser.l (remove_char): New static function. (DIGSEP, XDIGSEP, NUMSEP, FLOSEP, XNUMSEP, ONUMSEP, BNUMSEP, ONUM, BNUM): New named lex patterns. (FLODOT): Use DIGSEP instead of DIG. (ONUM): Use ODIG instead of [0-7]. (BNUM): Use BDIG instead of [0-1]. (grammar): New rule for producing NUMBER from decimal token with commas based on BNUMSEP instead of BNUM. This is a copy and paste so that the BNUM rule doesn't deal with the comma removal, not to slow it down. For the octal, binary and hex, we just switch to BNUMSEP, ONUMSEP and XNUMSEP, so they all go through one case. Floating point numbers are also handled with a copy pasted case using FLOSEP. * tests/012/syntax.tl: New test cases. * txr.1: Documented. * genvim.txr (alpha-noe, digsep, hexsep, octsep, binsep): New variables. (txr_pnum, txr_xnum, txr_onum, txr_bnum, txr_num): Integrate separating commas. Some bugs fixed in txr_num, some simplifications, better txr_badnum pattern. * lex.yy.c.shipped: Updated.
*	define-struct-clause: add tests.	Kaz Kylheku	2022-10-05	1	-0/+65
\| \| \| \|	* tets/012/oop-dsc.tl: New file.
*	oop: allow multiple :init, :fini, etc.	Kaz Kylheku	2022-10-04	2	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivation is that struct clause macros defined using define-struct-clause may want to introduce their own initializers and finalizers for the specific stuff they add to the struct. The uniqueness restrictions on these initializing and finalizing clauses makes it impossible to use two clause macros which both want to inject a definition of the same initializer or finalizer type. * stdlib/struct.tl (defstruct): Don't enforce that there be at most one clause in the category of :init, :postinit, :fini or :postini. Multiple are allowed. They all execute left-to-right except for :fini. * tests/012/fini.tl: New tests. * tests/012/fini.expected: Updated. * txr.1: Documented.
*	New: %fun% mechanism for current function name.	Kaz Kylheku	2022-10-03	2	-0/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* eval.c (pct_fun_s): New symbol variable, holding the usr:%fun% symbol. (fun_macro_env): New static function. (do_expand): For defun and defmacro, use fun_macro_env to establish an environment binding the %fun% symbol macro, and expand everything in that environment. (eval_init): Intern the %fun% symbol, initializing pct_fun_s, and also register a global symbol macro in that name so that we can freely use %fun% everywhere without worrying that the code will blow up. E.g. a logging macro can use it to get the function name, but still be useful in a top-level form outside of a named function. * stdlib/struct.tl (sys:meth-lambda): New macro. (defstruct, defmeth): Use sys:meth-lambda as a replacement for lambda to set up the %fun% symbol macro. In the :init case which doesn't use a lambda, an open-coded symacrolet does the job. * tests/019/pct-fun.tl: New file. * tests/019/pct-fun.expected: Likewise. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
*	New method: str-addr.	Kaz Kylheku	2022-10-03	1	-0/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* socket.c (sock_set_entries): Intern str-addr symbol. There is no autoload on this because the struct types of which this is a method don't exist if the socket module has not been loaded. * stdlib/socket.tl ((sockaddr-in str-addr), (sockaddr-in6 str-addr), (sockaddr-un str-addr)): New methods. * tests/014/str-addr.tl: New file. This provides coverage not just for the str-addr method, but the hitherto untested address to text functions. This is why the bug was found, that was addressed in the previous commit. The test case which produces "8000::1" was actually producing "800:1". * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
*	New sockaddr-str function.	Kaz Kylheku	2022-10-02	1	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This function "intelligently" constructs an address object of the right type from a string. * socket.c (sock_set_entries): Autoload socket.tl on sockaddr-str function being accessed. * stdlib/socket.tl (sockaddr-str): New function. * tests/014/sockaddr-str.tl: New file. * txr.1: Documented. * stdlib.doc-syms.tl: Updated.
*	New :postfini feature in defstruct.	Kaz Kylheku	2022-09-27	2	-1/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The :postfini clause registers a finalizer that runs in the ordinary order: after previously registered ones. This has the effect of allowing a derived structure to run clean-up actions after those of inherited structures. Either order can be useful because the dependencies between base and derived can go in either direction. It's a huge mistake in C++ that it supports only derived-first destructor invocation order. * stdlib/struct.tl (defstruct): Recognize and translate :postfini clause. It's exactly like :fini but omits the t parameter in the finalize call, registering in the natural order. * tests/012/fini.tl (derived): Add :postfini handler. * tests/012/fini.expected: Updated to reflect the messages coming from the postfini handler, which are happening in the correct order. * txr.1: Documented.
*	seq-iter: bugfix: floating-point ranges.	Kaz Kylheku	2022-09-15	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* lib.c (seq_iter_get_range_bignum): Static function renamed to seq_iter_get_range_number because it in fact generalizes to numbers. (seq_iter_peek_range_bignum): Renamed to seq_iter_peek_range_number. (seq_iter_get_rev_range_bignum): Renamed to seq_iter_get_rev_range_number. (seq_iter_peek_rev_range_bignum): Renamed to seq_iter_peek_rev_range_number. (si_range_bignum_ops): Renamed to si_range_number_ops. (si_rev_range_bignum_ops): Renamed to si_rev_range_number_ops. (seq_iter_init_with_info): Handle ranges where the from value is floating-point. Also, if the from-value is bignum that fits into cnum range, we now try to handle that as a cnum range. * tests/012/iter.tl: New tests.
*	compiler: bug: bad basic-block merge across end insn.	Kaz Kylheku	2022-09-15	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bad situation reproduced as a miscompilation of some prof forms at opt-level 5 or above. The basic idea is that there is a situation like this prof t2 ... profiled code here producing value in t8 mov t2 t8 end t2 end t2 The code block produces a value in t8, which is copied into t2, and executes the end instruction. This instruction does not fall through to the next one but passes control back to the prof instruction. The prof instruction then stores the result value, which came from t2, back into the t2 register and resumes the program at the end t2. The first bad thing that happens is that the end instructions get merged together into one basic block. The optimizer then treats them without regard for the prof instruction, as if they were a linear sequence. It looks like the register move mov t2 t8 is wasteful and so it eliminates it, rewriting the end instruction to: end t8 end t8 Of course, the second instruction is now wrong because prof is still producing the result in t2. To fix this without changing the instruction set, I'm introducing another pseudo-op that represents end, called xend. This is similar to jend, except that jend is regarded as an unconditional branch whereas xend isn't. The special thing about xend is that a basic block in which it occcurs is marked as non-joinable. It will not be joined with the following basic block. * stdlib/asm.tl (xend): New alias opcode for end. * stdlib/compiler.tl (comp-prof): Use xend to end prof fragment, rather than plain end. * stdlib/optimize.tl (basic-block): New slot, nojoin. If true, block cannot be joined with next one. (basic-blocks jump-ops): Add xend to list of jump ops, so that a basic block will terminate on xend. (basic-blocks link-graph): Set the nojoin flag on a basic block which contains (and thus ends with) xend. (basic-blocks local-liveness): Add xend to the case in def-ref that handles end. (basic-blocks (peephole, join-blocks)): Refuse to join blocks marked nojoin. * tests/019/comp-bugs.tl: New file with miscompiled test case that was returning 42 instead of (42 0 0 0) as a result of the wrong register's value being returned.
*	compiler: bug: scoping of lambda optionals.	Kaz Kylheku	2022-09-15	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The scoping is not behind handled correctly for optional variables. The init-forms are being evaluated in a scope in which all the variables are already visible, instead of sequentially. Thus, for instance, variable rebinding doesn't work, as in (lambda (: (x x)) ...). When the argument is missing, x ends up with the value : because the expression refers to the new x, rather than the outer x. * stdlib/compiler.tl (compiler comp-lambda-impl): Perform the compilation of the init-forms earlier. Use the same new trick that is used for let: the target for the code fragment is a locaton obtained from get-loc, which is then attached to a variable afterward. The spec-sub helper is extended with a loc parameter to help with this case. tests/012/lambda.tl: New test case that fails without this fix.
*	compiler: test for recent bugfix.	Kaz Kylheku	2022-09-14	1	-0/+2
\| \| \| \| \| \|	* tests/012/lambda.tl: Add the test case which reproduces the compiler failure that was fixed several commits ago.
*	syntax: read and print [. x] and [. @x].	Kaz Kylheku	2022-09-08	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* lib.c (obj_print_impl): Handle (dwim . atom) syntax by printing [. atom]. Note that (dwim . @var) and (dwim . @(expr)) already print as [. @var] and [. @(expr)]; this is not new. But none of these forms are supported by reading without the accompanying change to the parser. * parser.y (dwim): Handle the [. expr] and [ . expr] syntax, so that forms like [. a] and [. @a] have print-read consistency. The motivation is to be able to [. @args] in pattern matching to match a DWIM forms; I tried that and was surprised to have it blow up in my face. * tests/012/readprint.tl: New test file. Future printer/parser changes will be tested here. Historically, changes to the syntax have not been consistently unit-tested. * y.tab.c.shipped: Regenerated.
*	close-lazy-streams: test.	Kaz Kylheku	2022-08-30	1	-0/+3
\| \| \| \|	* tests/018/close-lazy.tl: New file.
*	txr: test for new @(next) behaviors.	Kaz Kylheku	2022-08-30	2	-0/+17
\| \| \| \| \| \|	* tests/018/noclose.txr: New file. * tests/018.noclose.expected: New file.
*	New function: search-all	Kaz Kylheku	2022-08-17	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* eval.c (eval_init): search-all intrinsic registered. * lib.c (search_common): New Boolean argument all, indicating whether all positions are to be returned. We must handle this in the two places where empty key and sequence are handled, and also in the main loop. A trick is used: the found variable is now bound by list_collect_decl, but not used for collecting unless all is true. (search, rsearch, contains): Pass 0 for all argument of search_common. (search_all): New function. * lib.h (search_all): Declared. * tests/012/seq.tl: New tests. * txr.1: Documented. * stdlib/doc-syms.tl: Regenerated.
*	search/rsearch: some test cases.	Kaz Kylheku	2022-08-17	1	-0/+36
\| \| \| \|	* tests/012/seq.tl: New tests.
*	path-components-safe: tighten /proc check	Kaz Kylheku	2022-07-30	1	-30/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Attacks are possible via /proc/<pid>/fd/<n> involving a deleted file, whereby the link target changes from "/path/to/file" to "/path/to/file (deleted)", which can be perpetrated by a different user, not related to process <pid>, who has access to perform unlink("/path/to/file"). * stdlib/path-test.tl (safe-abs-path): Perform the pattern check regardless of effective user ID. * tests/018/path-safe.tl: Test cases adjusted.
*	path-components-safe: repel /proc symlink attacks	Kaz Kylheku	2022-07-29	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a Linux system, it's possible for an unprivileged user to create a root symlink pointing to any directory, simply by changing to that directory and running a setuid executable like "su". That executable will get a process whose /proc/<pid> directory is root owned, and contains a symlink named cwd pointing to the current directory. Other symlinks under /proc look exploitable in this way. * stdlib/path-test.tl (safe-abs-path): New function. Here is where we are going to check for unsafe paths. We use some pattern matching to recognize various unsafe symlinks under /proc. (path-components-safe): Simplify code around recognition of absolute paths. When an absolute path is read from a symlink, remove the first empty component. Pass every absolute path through safe-abs-path to check for known unsafe paths. * tests/018/path-safe.tl: New tests.
*	gcd: rewrite for better efficiency.	Kaz Kylheku	2022-07-27	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \|	* arith.c (gcd): New implementation which uses arithmetic in the unsigned type ucnum if both operands are in that type's range. This uses Stein's algorithm a.k.a. binary GCD. The mpi_gcd function is used only if at least one argument is a bignum whose value doesn't fit into a ucnum. * tests/016/arith.tl: gcd test cases added.
*	New function: path-components-safe.	Kaz Kylheku	2022-07-25	1	-0/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* autoload.c (path_test_set_entries): Autoload on path-components-safe symbol. * stdlib/path-test.tl (if-windows, if-native-windows): New system macros. (path-safe-sticky-dir): New system function. (path-components-safe): New function. * tests/018/path-safe.tl: New file.' * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
*	New function: count.	Kaz Kylheku	2022-07-18	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The general count function, with keyfun and testfun, is noticeably absent. Let's implement it. * lib.[ch] (count): New function. * eval.c (eval_init): Register count intrinsic. * tests/012/seq.tl: Some tests for count. * txr.1: Add count to count-if section. Revise documentation based on pos/pos-if. * stdlib/doc-syms.tl: Updated.
*	bugfix: missing gzip support in open-command.	Kaz Kylheku	2022-06-21	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* stream.c (pipe_close_status_helper): New function, factored out of pipe_close and used by it, and also by gzio_close. (pipe_close): Call pipe_close, which now contains the classification of process wait status codes. (open_fileno): Now takes optional pid argument. If this specified, then make_pipevp_stream is used. (open_subprocess): Use the open_fileno function, rather than fopen. This simplifies things too, except that we have to catch exception. Pass pid to the newly added parameter of open_fileno so that we obtain a proper pipe stream that will wait for the process to terminate when closed. (mkstemp_wrap): Pass nil for pid argument of open_fileno. (stream_init): Update registration of open-fileno. * gzio.c (struct gzio_handle): New member, pid. (gzio_close): If there is a nonzero pid, wait for the process to terminate. (make_gzio_stream): Initialize h->pid to zero. (make_gzio_pipe_stream): New function. * parser.c (lino_fdopen): Pass nil for pid argument of open_fileno. * gzio.h (make_gzio_pipe_stream): Declared. * tests/018/gzip.tl: New test.
*	New function: str	Kaz Kylheku	2022-06-12	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The str function is like mkstring but allows a fill pattern to be specified. * eval.c (eval_init): str intrinsic registered. * lib.[ch[ (str): New function. * tests/015/str.tl: New file. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
*	cygwin: bug: sh always uses cmd.exe.	Kaz Kylheku	2022-05-31	1	-9/+3
\| \| \| \| \| \| \| \| \| \|	* stream.c (sh): Use a single definition for this function, which uses the shell and shell_arg variables to use either /bin/sh -c or cmd.exe /c. We only want to use cmd.exe when running as a Windows native program on Cygnal. * tests/018/process.tl: Remove workaround from test case. This is what was causing the weirdness.
*	buf: compression tests.	Kaz Kylheku	2022-05-30	1	-0/+18
\| \| \| \| \| \| \| \| \| \|	* buf.c (buf_compress): Let's use the level value of -1 if not specified, so Zlib defaults it to 6, or whatever. * tests/012/buf.tl: New tests. * txr.1: Note that -1 is a valid level value and that is the default.
*	gzio: some tests.	Kaz Kylheku	2022-05-30	1	-0/+53
\| \| \| \|	* tests/018/gzip.tl: New file.
*	New: spln and tokn functions.	Kaz Kylheku	2022-05-30	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of trying to work the new count parameter into the spl and tok functions, it's better to make new ones. * eval.c (eval_init): spln and tokn intrinsics registered. * lib.[ch] (spln, tokn): New functions. * tests/015/split.tl: New test cases. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
*	tok-str: takes count argument.	Kaz Kylheku	2022-05-28	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* eval.c (eval_init): Update registration of tok-str. * lib.c (tok_str): New argument, count_opt. Implemented in the compat 155 case; what the heck. (tok): Pass nil to new parameter of tok_str. * lib.h (tok_str): Declaration updated. * tests/015/split.tl: New tests. * txr.1: Documented.
*	tests: fix failing load-search test.	Kaz Kylheku	2022-05-26	1	-1/+2
\| \| \| \| \| \|	* tests/019/load-search.tl: skip a certain test if it is run as superuser; it fails because superuser is not affected by denied directory search and execute permissions.
*	ffi: reproduce odd GNU C behavior for aligned bitfields.	Kaz Kylheku	2022-05-24	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've already taken care of imitating the situation that GNU C allows __attribute__((aligned(n))) to weaken the alignment of a bitfield, contrary to it being documented that align only strengthens alignment. Even a value of n == 1 is meaningful in that it can cause the bitfield to start allocating from a new byte. This patch corrects a newly discovered nuance: when a bitfield is attributed with a weaker alignment than its underlying type (e.g. uint32_t field marked with 2 byte alignment), the original type's alignment is still in effect for calculating the alignment of the structure, and the padding. * ffi.c (struct txr_ffi_type): New member oalign, for keeping track of the type's original alignment, prior to adjustment. (make_ffi_type_struct): For a named bitfield, take the oalign value into account when determining the most strict member alignment. (ffi_type_compile): When marking a type as aligned, the we remember the original alignment in atft->oalign. * tests/017/bitfields.tl: New test case, struct s16. * txr.1: Documented.
*	fixup! ffi: couple of tests; assertion.	Kaz Kylheku	2022-05-24	1	-5/+5
\|
*	ffi: couple of tests; assertion.	Kaz Kylheku	2022-05-24	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	* ffi.c (make_ffi_type_struct): Add check for impossible condition. The bits_alloc variable could only exceed bits_type (and thus cause the room variable to have a nonsensical, large value) if the bitfield allocation tried to continue allocating bits into an aligned unit, whose alignment exceeds the size of the underlying type. But in that case, tft->aligned would have to be true, and so the offset would have been aligned prior to this code, rendering bits_alloc zero. * tests/017/bitfields.tl: New tests.
*	ffi: bitfield tests and fixes.	Kaz Kylheku	2022-05-23	1	-0/+587
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bitfield allocation rules are wrong. Some of it is due to the recent changes which are based on incorrect analysis, but reverting things doesn't fix it. The idea that we compare the current member's alignment with the previous is wrong; it is not borne out by empirical tests with gcc. So we do a straight revert of that. In GNU C, an __attribute__((aligned (N))) attribute applied to a bitfield member will perform the requested alignment if, evidently, the bit field is already being placed into a new byte. (If the bit field is about to be packed into an existing byte, then there is a warning about the align attribute being ignored). Because we don't have alignment as a member attribute, but only as a type attribute, we must implement a flag which indicates that a type has had align applied to it (even if the alignment didn't change) so we can then honor this in the right place in the bitfield allocation code. * ffi.c (struct txr_ffi_type): New attribute flag, aligned. (make_ffi_type_struct): Remove the prev_align variable and all related logic. Consolidate all alignment into one place, which is done before we allocate the bitfield or regular member. We align if the new member isn't a bitfield, or even if it is a bitfield if it has the aligned attribute, or if the bitfield is changing endian compared to the previous member (our local rule, not from GNU C). (ffi_type_compile): The align and pack operators now set the aligned attribute, except in the (pack 1 ...) case which semantically denotes lack of alignment. * tests/017/bitfields.tl: New file. * txr.1: Documented.
*	tests: add forgotten test for new expansion rule.	Kaz Kylheku	2022-05-21	1	-0/+11
\| \| \| \| \| \| \| \|	This was developed together with what became the May 12 commit 1162a735b61c1c5086fb6055471ee35cc8ed62a4; I just forgot to git add the file. * tests/011/macros-4.tl
*	ffi: flex structs: minor refactor.	Kaz Kylheku	2022-05-21	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \|	* ffi.c (ffi_flex_struct_in): Function renamed to ffi_flex_array_len, because its responsibility is determining the length of a flexible array that is not null terminated. We don't pass in the structure's type's descriptor any more, but the member descriptor. (ffi_struct_in, ffi_struct_get): Follow rename and changed parameter conventions. * tests/017/flexstruct.tl: Added test case with nested flexible structure.
*	ffi: testing and fixing flexible arrays.	Kaz Kylheku	2022-05-20	1	-0/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* ffi.c (ffi_flex_struct_in): Check for the last member being an array, and not null-terminated. We now check the character conversion disposition of the array. If it has character conversion, then we store the length right into the slot that will become the string. In the no-conversion case, we assume that if the member exists, it's a vector we can resize. Otherwise we plant a vector of the required size. (ffi_varray_put): Only call ffi_varray_dynsize if the Lisp object is a vector. If the Lisp objecct is a number, then use that as the size. Otherwise the size is zero. * tests/017/flexstruct.tl: New file.
*	utf8: bugfix: trailing char fragment ignored.	Kaz Kylheku	2022-05-20	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After "years of trouble-free operation" a bug in the UTF-8 decoder was found, which violates its property that any sequence of bytes will decode to some kind of string, which will encode to the original bytes. When the UTF-8 data prematurely ends in the middle of a valid character, the decoder just drops that data as if it didn't exist. So for instance the two-byte sequence E6 BC should decode to "\xDCE6\xDCBC", since it is a fragment of a three-byte UTF-8 sequence. It actually decodes to the empty string. * utf8.c (utf8_bfom_buffer): When the buffer is exhausted, if we are not in the utf8_init state, it means we were in the middle of a UTF-8 sequence. Walk the bytes from the backtrack point to the end of the buffer and store them into the string as U+DCxx codes. * tests/012/buf.tl: Tests added for this via buf-str, str-buf.
*	ffi: pack bugfix and tests.	Kaz Kylheku	2022-05-20	1	-0/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* ffi.c (ffi_transform_pack): Fix: return the original syntax in the situation when no cases are recognized, rather than the cdr of the syntax. When the struct/union syntax has no members, return the original syntax to indicate no transformation took place. * txr.1: Document the feature that pack on a typedef name or struct name with no members will do the alignment adjustment only, without the syntactic transformation. * tests/017/pack-align.tl: New file.
*	New function: trim-path-seps	Kaz Kylheku	2022-05-20	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \|	* stream.c (trim_path_seps): New function. (stream_init): trim-path-seps intrinsic registered. * stream.c (trim_path_seps): Declared. * tests/018/path.tl: New tests. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
*	ffi: bugfix: empty structs/unions have alignment of 1.	Kaz Kylheku	2022-05-19	1	-0/+4
\| \| \| \| \| \| \| \|	* ffi.c (make_ffi_type_struct, make_ffi_type_union): Initialize most_align local variable to 1, so the lower bound of alignment is that, rather than zero. * tests/017/ffi-misc.tl: Tests added.
*	ffi: support 64 bit bitfields.	Kaz Kylheku	2022-05-19	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* ffi.c (struct txr_ffi_type): Replace unsigned mask member with a union m which holds unsigned mask and 64-bit fmask (fat mask). (ffi_sbit_put, ffi_sbit_get, ffi_ubit_put, ffi_ubit_get): Refer to m.mask. (ffi_fat_sbit_put, ffi_fat_sbit_get, ffi_fat_ubit_put, ffi_fat_ubit_get): New static functions. (ffi_generic_fat_sbit_put, ffi_generic_fat_sbit_get, ffi_generic_fat_ubit_put, ffi_generic_fat_ubit_get): Likewise. (make_ffi_type_struct, make_ffi_type_union): Set up fat mask for bitfields that are wider than int. (ffi_type_compile): Refer to m.mask for the int and unsigned int based bitfields declared with sbit and ubit that don't mention a type. The bit operator now allows int64 and uint64 to be valid types for a bitfield. In this case, the "fat" get and put functions are selected which use 64 bit operations. Thus there is no efficiency impact on non-fat bitfields which continue to use code with 32 bit operands. (ffi_offsetof): Use the bitfield flag in the member's type structure to detect bitfields, rather than the mask.
*	ffi: alignment bug in undimensioned arrays.	Kaz Kylheku	2022-05-18	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because the varray behavior for undimensioned arrays was introduced in dubious commit 7880c9b565ab438e1bf0250a967acdbf8d04cb42 in 2017, which used make_ffi_type_pointer to register the type, claiming that the C representation is pointer (which was not true in that commit, nor ever since). As a result, though, undimensioned arrays received the alignment of pointers, rather than deriving it from the element type. Thus (array char) has 4 or 8 byte alignment whereas (array 4 char) correctly has 1 byte alignment. * ffi.c (ffi_type_compile): Use make_ffi_type_array for the two-element array syntax, just like for the dimensioned case with three elements. Then override some of the functions with the varray versions. * tests/017/ffi-misc.tl: Fix the test case which exposed this. In the type (struct flex (a char) (b (zarray char)), the array b must be at offset 1. I didn't notice that the offset of 4 being confirmed by the test case was wrong, but this showed up when running the test case on a platform with 8 byte pointers.
*	ffi: fix broken test.	Kaz Kylheku	2022-05-18	1	-2/+2
\| \| \| \| \| \| \|	* tests/017/ffi-misc.tl: Fix incorrect test whose loop body does not execute. A remaining issue here is why the diagnostics about unbound functions and variables in the loop body get swept under the rug.
*	split-str: new count parameter.	Kaz Kylheku	2022-05-17	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* eval.c (eval_init): Fix up registration of split-str to account for new parameter. * lib.c (split_str_keep): Implement new optional count argument. (spl): Pass nil value to split_str_keep for new argument. I'd like this function to benefit from this argument also, but the design isn't settled. (split_str): Pass nil argument to split_str_keep. * lib.h (split_str_keep): Declaration updated. * tests/015/split.tl: New tests. * txr.1: Documented.
*	ffi: bugfix: null terminated string as flexible member.	Kaz Kylheku	2022-05-17	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	* ffi.c (ffi_char_array_get, ffi_zchar_array_get, ffi_wchar_array_get, ffi_bchar_array_get): Rearrange so that we test for tft->null_term first, and not nelem == 0. If nelem happens to be zero, but we are supposed to decode a null-terminated string, we will do the wrong thing and return the null string. (ffi_varray_in): The body can't be conditional on vec being non-nil, because then we do nothing if we don't have a Lisp object, which means we skip the cases when we should decode a null-terminated array. Now if vec is nil, we must guard against calling ffi_varray_dynsize.