txr - TXR: A data munging language.

	Commit message (Collapse)	Author	Age	Files	Lines
*	New function: hash-map.	Kaz Kylheku	2023-06-28	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	hash-map converts a function mapping over a sequence into a hash table. * hash.[ch] (hash_map): New function. * tests/010/hash.tl: Test case. * genman.txr: The hash-map identifier introduces a hash collision. We have to deal with that somehow now. (colli): We put the conflicting entries into a new hash called colli which maps them to an increment value. (hash-title): Increment the hash code h by the amount indicated in colli, if the title is found there. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
*	hash: support existing mutation+iteration semantics.	Kaz Kylheku	2023-06-20	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original chained hashing scheme makes certain guarantees in situation when a hash table that is being iterated is also undergoing insertions or deletions. The original scheme meets these requirements simply, by putting a freeze against hash table growth while there are outstanding iterations. Chained hashing gracefully handles load factors above 1. Load factors above 1 are not possible under open addressing (and we don't even want to approach 1) but we would like to preserve the requirements. The requirements are: 1. If an iterator has already visited an item, it will not see that item again, regardless of insertions which cause the table to be reorganized. 2. It is not specified/required that an iterator will visit newly inserted items. It may visit some of those items, but not others. 3. If an iterator has not yet visited an item, and that item is deleted, it will not see that item, regardless of any insertions that reorganize the table. In this commit, we implement a "table stack" scheme. 1. When a table is resized due to insertions, and it is being iterated (h->usecount > 0), in that situation it will push the existing small table onto a stack, the h->tblstack (table stack). 2. Iterators take a hash table's current table and its size, and work with that snapshot of the table. If the original hash table grows, existing iterators work with the original table as it existed just before the reorganization. So after that they do not see any new insertions. 3. Whenever the delete operation (hash_remove) finds the item and removes it from the current table, it also walks the table stack, searches for the item in every table in the stack and nulls it out. This search is oblivious to nil; it's a blind search that goes around the table starting at the first probe position, looking for the identical cons cell to take out. This algorithm ensures that iterators will not see a deleted item, unless they already visited it before the deletion, of course. * hash.h (struct hash_iter): New members table, and mask. * hash.c (struct hash): New member, tblstack. (hash_grow): We drop the vec argument and recreate it locally (not essential to this commit). If we find that the usecount is positive, we push the existing table onto the table stack. Otherwise, we take the opportunity to obliterate the table stack. (hash_insert): Drop the restriction that hash_grow is only called when the use count is zero. Adjust calls to hash_grow to drop the vec argument. (hash_remove): When an item is found and deleted, and the table is in use by iterators, walk the table stack and delete it from each previous table. Otherwise, if the table is not in use by iterators, obliterate the table stack. (hash_mark): Exit early also if there is a table stack, and mark that stack. (do_make_hash, make_similar_hash, copy_hash): Initialize table stack in new hash. (hash_iter_mark): Mark the iterator's table. This is likely not necessary since we also mark the hash table, which should have a pointer to that same table. That wouldn't be defensive programming, though. (hash_iter_init, us_hash_iter_init): Initialize table and mask. (hash_iter_next_impl, hash_iter_peek): These functions have to walk the table snapshot taken by the iterator, using the captured mask, and not the current table. (has_reset): If the target table's use count drops to zero, obliterate its table stack. We add a missing setcheck here; this operation potentially stores a different hash into an existing iterator. It's not being done safely with regard to generational GC. * tests/010/hash.tl: New tests.
*	hash: new function, hash-props.	Kaz Kylheku	2023-05-01	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't have a function in the hash table module which can create a populated hash table in one step without requiring the caller to create auxiliary lists. This new function fills that gap, albeit with some limitations. * hash.c (hash_props): New function. (hash_init): Register hash-props intrinsic. * tests/010/hash.tl: New tests. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
*	New function: group-map.	Kaz Kylheku	2022-03-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	* hash.c (group_map): New function. (hash_init): group-map intrinsic registered. * hash.h (group_map): Declared. * tests/010/hash.tl: New test case. * txr.1: Documented together with group-by. Extra paren removed from group-by example.
*	hash: add tests for group-by, group-reduce.	Kaz Kylheku	2022-03-02	1	-0/+16
\| \| \| \|	* tests/010/hash.tl: New tests.
*	diff/isec: reset hash/tree iter instead making new.	Kaz Kylheku	2021-05-10	1	-0/+6
	* lib.c (seq_iter_rewind): Use hash_reset and tree_reset to rewind the existing iterator rather than allocating a new one. * tests/010/hash.tl: New file, covering uni, diff and isec for hash tables. * tests/010/tree.tl: New tests.