txr - TXR: A data munging language.

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	lib: middle_pivot: whitespace fix.	Kaz Kylheku	2019-10-15	1	-4/+4
\| \| \| \|	* lib.c (middle_pivot): Fix non-conforming indentation.
*	printer: obj_hash must be eq-based.	Kaz Kylheku	2019-10-11	1	-2/+2
\| \| \| \| \| \| \| \| \|	The printer must use an eq-based hash table for detecting circularity, otherwise it blows up on circular range objects. * lib.c (obj_print): instantiate ctx->obj_hash as an eq-based hash table, not eql-based.
*	sort: remove obsolete comments.	Kaz Kylheku	2019-10-08	1	-10/+1
\| \| \| \| \| \|	* lib.c (sort_list, sort): Remove comments about dangerous mutation; these pertain to some explicit logic which existed in previous versions of the code to handle those situations.
*	tree: circular notation support.	Kaz Kylheku	2019-10-07	1	-0/+5
\| \| \| \| \| \|	* lib.c (populate_obj_hash): Handle tree object. * parser.c (circ_backpatch): Likewise.
*	safety: fix type tests that code can subvert.	Kaz Kylheku	2019-09-30	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes numerous instances of a safety hole which involves the type of a COBJ object being tested to be of a given class using logic that can be subverted by the definition of a like-named struct. Specifically logic like (typeof(obj) == hash_s) is broken, because if a struct type called hash is defined, then the test will yield true for instances of that struct type. Those instances can then be passed into code that only works on COBJ hashes, and relies on this test to reject invalid objects. * ffi.c (make_carray): Replace fragile test with strong one, using new cobjclassp function. * hash.c (hashp): Likewise. * lib.c (class_check): The expression used here for the type test moves into the new function cobjclassp and so is replaced by a call to that function. (cobjclassp): New function. * lib.h (cobjclassp): Declared. * rand.c (random_state_p): Replace fragile test using cobjclassp. * regex.c (char_set_compile): Replace fragile typeof tests for character type with is_chr. (reg_derivative, regexp): Replace fragile test with cobjclassp. * struct.c (struct_type_p): Replace fragile test with cobjclassp.
*	Use put_char for single character output.	Kaz Kylheku	2019-09-26	1	-3/+3
\| \| \| \| \| \| \|	* hash.c (hash_print_op): Replace length 1 put_string calls with put_char. * lib.c (obj_print_impl): Likewise.
*	New data type: tnode.	Kaz Kylheku	2019-09-22	1	-1/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Binary search tree nodes are being added as a basic heap data type. The C type tag is TNOD, and the Lisp type is tnode. Binary search tree nodes have three elements: a key, a left child and a right child. The printed notation is #N(key left right). Quasiquoting is supported: ^#N(,foo ,bar) but not splicing. Because tnodes have three elements, they they fit into TXR's four-word heap cell, not requiring any additional memory allocation. These nodes are going to be the basis for a binary search tree container, which will use the scapegoat tree algorithm for maintaining balance. * tree.c, tree.h: New files. * Makefile (OBJS): Adding tree.o. * eval.c (expand_qquote_rec): Recurse through tnode cells, so unquotes work inside #N syntax. * gc.c (finalize): Add TNOD to no-op case in switch; tnodes don't require finalization. (mark_obj): Traverse tnode cell. * hash.c (equal_hash): Add TNOD case. * lib.c (tnode_s): New symbol variable. (seq_kind_tab): New entry for TNOD, mapping to SEQ_NOTSEQ. (code2type, equal): Handle TNOD. (obj_init): Initialize tnode_s variable. (obj_print_impl, populate_obj_hash): Handle TNOD. (init): Call tree_init function in tree.c. * lib.h (enum type, type_t): New enumeration TNOD. (struct tnod): New struct type. (union obj, obj_t): New union member tn of type struct tnod. (tnode_s): Declard. * parserc.c (circ_backpatch): Handle TNOD, so circular notation works through tnode cells. * parser.l (grammar): Recognize #N prefix, mapping to HASH_N token. * parser.y (HASH_N): New grammar terminal symbol. (tnode): New nonterminal symbol. (i_expr, n_expr): Add tnode cases to productions. (yybadtoken): Map HASH_N to "#N" string.
*	equal: reduce type checking for conses.	Kaz Kylheku	2019-09-20	1	-3/+22
\| \| \| \| \| \| \| \|	* lib.c (equal): Since we have switched on the type of the left and right argument, we can access the object directly instead of going through car and cdr. Except that for a lazy conses, we need at least one such access to force the object first.
*	buffers: allow inequality comparison with less.	Kaz Kylheku	2019-09-20	1	-0/+14
\| \| \| \| \| \| \| \|	* lib.c (less_tab_init): Assign category 6 to BUF type, so buffers are sorted after other types. (less): Add BUF case. * txr.1: Documented.
*	gc: align objects more strictly.	Kaz Kylheku	2019-09-12	1	-1/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In this commit, we ensure that objects in the heap are aligned to at east eight byte boundaries (the minimum alignment from most malloc implementations on 32 and 64 bit systems). If possible, we align objects to a multiple of their size, sizeof (obj_t), which is 16 bytes on 32 bit platforms and 32 bytes on 64 bit platforms. We do this by making the object array the first field of the heap structure, and by allocating it with an aligned allocator function, if possible. * configure: detect memory alignment function: either memalign (preferred) or else posix_memalign (ugly duckling). We conditionally add either HAVE_MEMALIGN or HAVE_POSIX_MEMALIGN into config.h. * gc.c (OBJ_ALIGN): New macro. (struct heap, heap_t): Put the block member first, so objects are aligned with the containing heap. (in_heap): If the pointer is not aligned to a multiple of OBJ_ALIGN, it can't be a heap object; return zero. If allocations of the heap are aligned, then we don't need the additional alignment check in the loop body; if the pointer lands in the array, then the earlier OBJ_ALIGN check assures us it must be aligned. If we have only malloc alignment, we must do the check; the pointer could be to an address divisible by 8 which is in the middle of an obj_t. * lib.c: If HAVE_MEMALIGN is true, then include <malloc.h> so we have it declared. (memalign): If HAVE_POSIX_MEMALIGN is true, this static function is defined; it's compatible with the Glibc memalign. If HAVE_MEMALIGN and HAVE_POSIX_MEMALIGN are false, then memalign is defined as a malloc wrapper which doesn't align. (chk_malloc_gc_more): Use memalign instead of malloc. If aligned allocation is available, this will cause the heap to be aligned to a multiple of the object size.
*	All HAVE_* macros should be tested with #if, not #ifdef.	Kaz Kylheku	2019-09-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* configure: In several config tests, test HAVE_SUPERLONG_T, HAVE_LONGLONG_T and HAVE_SYS_WAIT with #if. * lib.c: Test HAVE_GETENVIRONMENTSTRINGS with #if. * lib.h: Test HAVE_DOUBLE_INTPTR_T with #if. * mpi/mpi.c: Likewise. * mpi/mpi.h: Likewise. * socket.c: Test HAVE_GETADDRINFO with #if in three places. * stream.c: Test HAVE_SYS_WAIT and HAVE_SOCKETS with #if.
*	Improve overflow checks in string catenation.	Kaz Kylheku	2019-09-12	1	-8/+8
\| \| \| \| \| \| \| \|	* lib.c (cat_str, vscat): Use size_t type for the total, so that the wrapping behavior we depend on for overflow detection is well-defined. Also, there was an overflow check missing for the total + 1 beign passed to chk_wmalloc. Instead of adding that overflow check, let's just start the total at 1.
*	printer: put out BOM character as #\xFEFF.	Kaz Kylheku	2019-09-10	1	-1/+4
\| \| \| \| \| \| \| \| \|	* lib.c (obj_print_impl): The Unicode BOM is also a zero width non-breaking space, which causes it to look like the incomplete #\ syntax. Let's instead render it as #\xFEFF. A few other hex cases are moved up into the surrounding switch, and a little goto takes care of avoiding code duplication.
*	bracket: bug: wrong result when function is applied.	Kaz Kylheku	2019-09-10	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Reported by user vapnik spaknik. * lib.c (bracket): Don't rely on the index variable to step through the arguments, because it only counts fixed arguments. The args_get function doesn't increment the index beyond args->fill; when popping arguments from args->list, index stays unmodified. * tests/016/arith.tl: Tests for bracket added.
*	Bugfix: incorrect appending to improper lists.	Kaz Kylheku	2019-09-09	1	-9/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The list building framework underlying the list_collect_decl macro has a flaw: if the current list ends in an non-nil terminating atom, and the tail pointer isn't directly aiming at that atom, then a subsequent operation to add an item or append a suffix will just overwrite the atom. The correct behavior is to execute the same logic as if the tail pointer pointed at that atom on entry into the function: switch on the type of the atom, and append to it, if possible, or else throw an error. Thus, for instance, (append '(1 2 3 . 42) '(4)) wrongly returns (1 2 3 4), instead of producing an error. The 42 atom has disappeared. The example (append '(1 2 . "ab") "c") -> (1 2 . "abc") given in the man page doesn't work; it yields (1 2 . "c"). * lib.c (list_collect, list_collect_nconc, list_collect_append, list_collect_revappend, list_collect_nreconc): In the cases when the current tail object is a CONS and LCONS, and we move the tail, we must branch backwards and process the tail atom as if the tail had been that way on entry into the function. Doing this with a tail call would be nice, but in some of the functions, we hold a local resource already, so we simulate a local tail call by updating the tailobj variable and doing a backwards goto.
*	subtypep: structs with car or length method are sequences.	Kaz Kylheku	2019-09-06	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	* lib.c (subtypep): For the sequence supertype, check whether the subtype is a structure that has a length or car method, returning t if so. * struct.c (get_special_slot_by_type): New function. * struct.h (get_special_slot_by_type): Declared. * txr.1: Add <structures with cars or length methods> to the type hierarchy diagram.
*	seq_info: bug: nil for objects with only length method.	Kaz Kylheku	2019-09-06	1	-1/+1
\| \| \| \| \| \|	* lib.c (seq_info): Add missing else, which makes the function return nil for objects that have a length method, but not a car method.
*	subtypep: remove useless eq.	Kaz Kylheku	2019-09-06	1	-1/+1
\| \| \| \| \| \|	* lib.c (subtypep): The sub and sup parameters are compared for equality early in the function; byt the time we get here, we know they are not eq, so nil can be returned.
*	lib: access special methods via special slot mechanism.	Kaz Kylheku	2019-09-06	1	-26/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* ffi.c (ffi_flex_struct_in): Use get_special_slot to obtain length method. * lib.c (nullify_s, from_list_s, lambda_set_s): Definitions removed from here. (seq_info, car, cdr, rplaca, rplacd, make_like, nullify, replace_obj, length, empty, sub, ref, refset, dwim_set): Use get_special_slot to obtain special method from object, rather than maybe_slot. (obj_init): Remove initializations of nullify_s, from_list_s and lambda_set_s from here. * struct.c (enum special_slot): Definition removed from here. (nullify_s, from_list_s, lambda_set_s): Definitions moved here from lib.c. (special_sym): New static array. (struct_init): Initializations of nullify_s, from_list_s and lambda_set_s moved here from lib.c. (get_special_slot): New function. * struct.h (lambda_set_s): Declared. (enum special_slot): Definition moved here. (get_special_slot): Declared. * txr.1: Added compat note, since get_special_slot behaves like maybe_slot under 224 compatibility.
*	seq_info: remove redundant car slot lookup.	Kaz Kylheku	2019-09-04	1	-2/+0
\| \| \| \| \| \|	* lib.c (seq_info): Due to a copy-paste error maybe_slot is being accidentally called here twice for the same slot. Removing.
*	type: lcons and string are subtypes of sequence.	Kaz Kylheku	2019-09-04	1	-1/+1
\| \| \| \| \| \| \|	Omissions reported by user vapnik spaknik. * lib.c (subtypepe): The lcons type and string type must report as subtypes of sequence.
*	New function: tailp.	Kaz Kylheku	2019-09-03	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	* eval.c (eval_init): Register tailp intrinsic. * lib.c (tailp): New function. * lib.h (tailp): Declared. * txr.1: Documented.
*	New function: cptr-buf.	Kaz Kylheku	2019-08-21	1	-0/+7
\| \| \| \| \| \| \| \| \| \|	* eval.c (eval_init): Register cptr-buf intrinsic. * lib.c (cptr_buf): New function. * lib.h (cptr_buf): Declared. * txr.1: Documented.
*	New function: intern-fb.	Kaz Kylheku	2019-08-20	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To accompany find-symbol-fb, there is intern-fb, which is like intern, but searches the fallback list. * eval.c (eval_init): Register intern-fb intrinsic. * lib.c (intern_fallback_intrinsic): New function. Does defaulting and error checks, then calls intern_fallback, just like intern_intrinsic calls intern. * lib.h (intern_fallback_intrinsic): Declared. * txr.1: Documented.
*	lib: streamline interning slightly.	Kaz Kylheku	2019-08-20	1	-13/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We get rid of some defaulting and error checks from interning. This saves a few cycles on startup in the large number of intern calls that are performed. * eval.c (eval_init): Wire the intern intrinsic to the new intern_intrinsic function rather than intern. * lib.c (intern): Remove package lookup and error check on str argument. (intern_intrinsic): New function, which has the package lookup and error check. (intern_fallback): Remove package lookup and error check. * lib.h (intern_intrinsic): Declared. * txr.c (txr_main): Fix one instance of an intern call that relies on defaulting of the second argument, by passing cur_package.
*	new functions: find-symbol and find-symbol-fb.	Kaz Kylheku	2019-08-19	1	-8/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Turns out, there is already a find_symbol in lib.c, completely unused. * eval.c (eval_init): Register find-symbol and find-symbol-fb intrinsics. * lib.c (find_symbol): Fix this hitherto unused function to do correct defaulting of the package argument and, to accept an additional argument specifying the not-found value. (find_symbol_fb): New function. * lib.c (find_symbol): Declaration updated. (find_symbol_fb): Declared. * txr.1: Documented.
*	seq_iter: remove pointless one-member union.	Kaz Kylheku	2019-08-14	1	-6/+6
\| \| \| \| \| \| \| \|	* lib.h (struct seq_iter): union ul with just one member replaced by that member itself. * lib.c (seq_iter_get_vec, seq_iter_peek_vec, seq_iter_init): refer to it->len instead of it->ul.len.
*	where: bugfix: doesn't work for non-list sequence.	Kaz Kylheku	2019-08-14	1	-13/+7
\| \| \| \| \| \| \| \|	* lib.c (lazy_where_func, where): We have a regression here due to strangely trying to smuggle the predicate function in si->inf.obj, which cannot possibly work other than for lists whose seq iterators ignore that field. We switch to the trick of using the cdr field of the lazy cons to carry that forward.
*	reverse: bugfix: garbage object in error message.	Kaz Kylheku	2019-08-09	1	-1/+1
\| \| \| \| \|	* lib.c (reverse): pointer to the C function in is being used as a value; the correct expression is seq_in.
*	lib: don't GC-protect two non-heap objects.	Kaz Kylheku	2019-08-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	* lib.c (obj_init): The null string literal and "nil" do not require gc protection; they cannot be reclaimed by the garbage collector, which ignores them. Don't waste two slots in the prot_stack on them. This is a remnant from ancient TXR; these variables were protected already in Version 11 from September 2009. At that time, there were no built-in string literal objects; these two objects were heap-allocated.
*	relate: optimize with hashes.	Kaz Kylheku	2019-07-17	1	-3/+27
\| \| \| \| \| \| \| \| \| \| \|	* lib.c (do_relate_hash, do_relate_hash_dfl): New static functions. (relate): If the number of keys and values is the same, and there are more than ten, then use hashing. If the default value is specified, and it is nil, then a hash table can be returned directly, instead of a function. * txr.1: Note added that relate may return a hash.
*	chk_calloc: use unsigned arithmetic.	Kaz Kylheku	2019-07-11	1	-1/+1
\| \| \| \| \| \| \|	* lib.c (chk_calloc): Use unsigned arithmetic to figure out the total, which is only used for incrementing the malloc_bytes counter. The unsigned arithmetic is performed in the same type as that counter.
*	replace: deal with overlapping.	Kaz Kylheku	2019-07-05	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	* buf.c (replace_buf): In the same-type case, use memmove rather than memcpy in case the objects overlap, so we don't invoke C undefined behavior. * lib.c (replace_str, replace_vec): Likewise. * txr.1: Specify that if the replacement sequence overlaps with the target range of the destination sequence, or with any portion that has to be relocated if range changes size, then the behavior is unspecified.
*	empty: handle buffers.	Kaz Kylheku	2019-06-30	1	-0/+2
\| \| \| \|	* lib.c (empty): Handle BUF in switch.
*	seq_info: nullify bugfix.	Kaz Kylheku	2019-06-28	1	-13/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A change in the nullify function to support hash tables has broken various functions which classify an object using seq_info, obtainig a SEQ_HASHLIKE kind, and then work with si.obj using hash functions. But si.obj has been nullified. An example of a broken function is find-max. Basically, this can be attributed to a careless use of nullify in seq_info. The purpose of nullify is to support code which treats any sequence as if it were a list. But seq_info doesn't do that; it classifies sequences and treats them according to their kind. Under seq_info, the only non-list objects that get treated as lists are list-like structures. For these it makes sense to call nullify, in case they have a nullify method. * lib.c (seq_info): Don't unconditionally call nullify on all COBJ objects. Only call nullify on struct objects. If that returns nil, then treat the object as SEQ_NIL; and if it returns an object different from the original, then recurse.
*	seq-begin: bugfix: non-lists don't work.	Kaz Kylheku	2019-06-28	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	* lib.c (seq_begin): Do not null out si->inf.obj; it's needed for accessing hashes and vector-like objects. This bug means that seq-begin iteration has only worked correctly for lists. The original motivation was not to have spurious retention of the head of a lazy list, which is hereby reintroduced. But iterators can be rewound. Let's just document this away and leave it as a to-do item. * txr.1: Document the limitation of seq-begin w.r.t. lazy lists.
*	in: allow hash with keyfun and testfun.	Kaz Kylheku	2019-06-25	1	-1/+3
\| \| \| \| \| \| \|	* lib.c (in): A simple check and fallthrough lets this function process hash tables more generally in this function. * txr.1: Documented.
*	in: use seq_info	Kaz Kylheku	2019-06-25	1	-26/+28
\| \| \| \| \|	* lib.c (in): Keep the existing specialized cases, but use seq_info in the fallback.
*	Factor function name into self variable.	Kaz Kylheku	2019-06-25	1	-10/+15
\| \| \| \| \|	* lib.c (take, take_while, take_until, drop_while, drop_until): Move repeated function name into self variable.
*	drop-{while,until}: convert to seq_info.	Kaz Kylheku	2019-06-25	1	-16/+16
\| \| \| \| \| \|	* lib.c (drop_while, drop_until): Use seq_info, so these functions work with all sequences. Thus now for instance [drop-while zerop #b'0000f00d'] yields #b'f00d'.
*	empty: handle carray.	Kaz Kylheku	2019-06-25	1	-0/+2
\| \| \| \|	* lib.c (empty): Add carray sub case to COBJ case.
*	nullify: handle carray and hashes.	Kaz Kylheku	2019-06-25	1	-0/+4
\| \| \| \| \|	* lib.c (nullify): Add carray and hash subcases into the COBJ case.
*	Handle buffers in list collector functions.	Kaz Kylheku	2019-06-25	1	-0/+19
\| \| \| \| \| \|	* lib.c (nullify, list_collect, list_collect_nconc, list_collect_append, list_collect_nreconc, list_collect_revappend): Handle buffer type.
*	list_collect: handle objects.	Kaz Kylheku	2019-06-25	1	-1/+10
\| \| \| \| \| \| \| \| \|	* lib.c (list_collect): Handle sequence-like COBJ objects. We can add an item to using their respective replace functions. (replace_obj): Change to external linkage. * lib.h (replace_obj): Declared.
*	Code clean-up in list collector functions.	Kaz Kylheku	2019-06-25	1	-19/+23
\| \| \| \| \| \| \|	* lib.c (list_collect, list_collect_append, list_collect_revappend): Use local variables to avoid repeated expressions. (list_collect_nconc): Only call nullify in necessary cases.
*	seqp: expand definition of sequences.	Kaz Kylheku	2019-06-25	1	-12/+2
\| \| \| \| \| \| \|	* lib.c (seqp): Use seq_info to classify the object as a sequence. * txr.1: Update description of seqp.
*	replace: fix strange diagnostic from bad fallthrough.	Kaz Kylheku	2019-06-24	1	-2/+2
\| \| \| \| \| \| \| \| \|	* lib.c (replace): If a COBJ is passed to replace which doesn't support the operation, we wrongly pass it to replace_buf because the BUF case was added into the fallthrough pass. The end result is that length_buf blows up on the object, resulting in a strange diagnostic. The BUF case must be moved above COBJ.
*	* Makefile (OBJS): New objects chksum.o and chksums/sha256.o.	Kaz Kylheku	2019-06-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	* chksum.c, chksum.h, chksums/sha256.c, chksums/sha256.h: New files. * lib.c (init): Call chksum_init. * txr.1: Documented. * LICENSE: Add SHA-256 copyright notice.
*	packages: generational gc bug.	Kaz Kylheku	2019-06-19	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	* lib.c (make_package_common): The way the two hashes are assigned into the new package here is not correct. The problem is that the first make_hash can trigger gc. Then it is possible that the package object will move into the mature generation, after which the assignment of the second package is a wrong-way assignment requiring the set macro. Instead of bringing in that macro, the obvious way to solve it is to just allocate the hashes first, and then the package: exactly the way we build a cons cell from existing values.
*	Replace lt(x, zero) pattern.	Kaz Kylheku	2019-06-15	1	-19/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This slight inefficiency occurs in some 37 places in the code. In most places we replace lt(x, zero) with minusp(x). In a few places, !plusp(x) is used and surrounding logic is simplified. In one case, the silly pattern lt(x, zero) ? t : nil is replaced with just minusp(x). * buf.c (sub_buf, replace_buf): Replace lt. * combi.c (perm, rperm, comb, rcomb): Likewise. * eval.c (do_format_field): Likewise. * lib.c (listref, sub_list, replace_list, split_func, split_star_func, match_str, lazy_sub-str, sub_str, replace_str, sub_vec, replace_vec): Likewise. * match.c (weird_merge): Likewise. * regex.c (match_regex, match_regex_right_old, match_regex_right, regex_prefix_match, regex_range_left, regex_range_right): Likewise.