diff options
author | Kaz Kylheku <kaz@kylheku.com> | 2023-08-22 07:29:56 -0700 |
---|---|---|
committer | Kaz Kylheku <kaz@kylheku.com> | 2023-08-22 07:29:56 -0700 |
commit | f0426745961866fa181c9cafd604a18cab4eb143 (patch) | |
tree | 2c69af1ec0f621755163f7f97e8bdaea81c90868 /tests/017/flexstruct.tl | |
parent | 1a4bdbc566683f0e70412fd75af233526803d0d2 (diff) | |
download | txr-f0426745961866fa181c9cafd604a18cab4eb143.tar.gz txr-f0426745961866fa181c9cafd604a18cab4eb143.tar.bz2 txr-f0426745961866fa181c9cafd604a18cab4eb143.zip |
doc: new hashing scheme for navigation, doc lookup.
This is not an easy change to make because it breaks the
validity of existing URLs in the wild which point to specific
sections of the TXR manual.
Some of my recent changes to capitalization of numerous
headings have already broken many URLs, so we might as well
bite the bullet and do this now.
The problem with the current scheme is that entire section
titles are hashed: all the words of a title, not just the
names of functions. Whenever we add a new function, macro or
variable which is documented together with related functions
in the same paragraph under the same heading, the heading
changes, and the hash changes. For instance, the hash for
the hash-map identifier is actually the hash of the string
"Function <tt>hash-map</tt>".
Under the new scheme, section titles are hashed in a more
complicated way that is robust against most edits. If a
title contains any symbols marked up with <tt>, then the
leftmost such symbol is taken as the title. Otherwise,
the whole title is mapped to lower case.
There is no longer a stdlib/doc-syms.tl file, and the
special disambiguated "D-<HEX>" codes are also gone.
Symbols are no longer associated with section hashes or
disambiguation section codes. The hash of a symbol is
a 32 bit CRC-32 checksum, expressed as S-<HEX> where
<HEX> is 8 hex digits. A section which defines symbols
has not only a <a name="..."> for its own hash but also
additional <a name="...>" elements for each of the symbols
that it defines.
If a section defines an ambiguous symbol (one that is also
defined with a different meaning in a different section),
then that symbol is not linked to either section; it is
mapped to the generated disambiguating section.
* genman.txr (dupes): Renamed to dupe-hashes for clarity.
(tagnum): Hash removed.
(direct): New hash. Tracks the assocation between sections
hashes and hashes of symbols that are defined only in
those symbols (no ambiguity) and thus the symbol hashes
can navigate directly to the sections. Serves as a
complement to the disamb hash.
(colli): There are no collisions now, so
initialize this to empty.
(hash-str): Function removed.
(hash-title): This function becomes more complicated.
If a title has at least one <tt>..</tt> item, then
that is taken in its place. Either way, the title
is transformed and enumerated against duplication and
hashed with crc32 instead of the original custom hashing
function.
(enumerate): Function removed: enumeration of titles is
done inside hash-title.
All manipulations of symhash using material from HTML now
use html-decode, so that we hash the original symbol
name like "str<" and not "str<".
When filtering the BODY, we have a new case: whenever
we see a <a name="...">, we now check the new direct
hash to see if there is a list of symbol hashes for
the given section. If so, we generate additional
<a name="..."> definitions for all the symbol hashes.
At the end of the file, the "missing from image"
processing is condensed, and the generation of the
stdlib/doc-syms.tl file is removed.
* stdlib/doc-syms.tl: Removed.
* stdlib/doc-lookup.tl: Don't load doc-syms. Use crc32
plus formatting to conver a symbol to the hash that is
used in the document and try the lookup with that.
Diffstat (limited to 'tests/017/flexstruct.tl')
0 files changed, 0 insertions, 0 deletions