|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original chained hashing scheme makes certain
guarantees in situation when a hash table that is being
iterated is also undergoing insertions or deletions.
The original scheme meets these requirements simply,
by putting a freeze against hash table growth while
there are outstanding iterations. Chained hashing
gracefully handles load factors above 1.
Load factors above 1 are not possible under open
addressing (and we don't even want to approach 1)
but we would like to preserve the requirements.
The requirements are:
1. If an iterator has already visited an item, it
will not see that item again, regardless of insertions
which cause the table to be reorganized.
2. It is not specified/required that an iterator will
visit newly inserted items. It may visit some of those
items, but not others.
3. If an iterator has not yet visited an item, and
that item is deleted, it will not see that item,
regardless of any insertions that reorganize the table.
In this commit, we implement a "table stack" scheme.
1. When a table is resized due to insertions, and
it is being iterated (h->usecount > 0), in that situation
it will push the existing small table onto a stack,
the h->tblstack (table stack).
2. Iterators take a hash table's current table and its
size, and work with that snapshot of the table.
If the original hash table grows, existing iterators
work with the original table as it existed just before
the reorganization. So after that they do not see any
new insertions.
3. Whenever the delete operation (hash_remove) finds
the item and removes it from the current table,
it also walks the table stack, searches for the item
in every table in the stack and nulls it out.
This search is oblivious to nil; it's a blind search
that goes around the table starting at the first
probe position, looking for the identical cons cell
to take out. This algorithm ensures that iterators
will not see a deleted item, unless they already visited
it before the deletion, of course.
* hash.h (struct hash_iter): New members table, and mask.
* hash.c (struct hash): New member, tblstack.
(hash_grow): We drop the vec argument and recreate it
locally (not essential to this commit).
If we find that the usecount is positive, we push the
existing table onto the table stack. Otherwise,
we take the opportunity to obliterate the table stack.
(hash_insert): Drop the restriction that hash_grow is
only called when the use count is zero. Adjust calls
to hash_grow to drop the vec argument.
(hash_remove): When an item is found and deleted, and
the table is in use by iterators, walk the table stack
and delete it from each previous table. Otherwise,
if the table is not in use by iterators, obliterate
the table stack.
(hash_mark): Exit early also if there is a table stack,
and mark that stack.
(do_make_hash, make_similar_hash, copy_hash): Initialize
table stack in new hash.
(hash_iter_mark): Mark the iterator's table. This is
likely not necessary since we also mark the hash table,
which should have a pointer to that same table.
That wouldn't be defensive programming, though.
(hash_iter_init, us_hash_iter_init): Initialize table and mask.
(hash_iter_next_impl, hash_iter_peek): These functions
have to walk the table snapshot taken by the iterator,
using the captured mask, and not the current table.
(has_reset): If the target table's use count drops to zero,
obliterate its table stack. We add a missing setcheck here;
this operation potentially stores a different hash into
an existing iterator. It's not being done safely with
regard to generational GC.
* tests/010/hash.tl: New tests.
|