summaryrefslogtreecommitdiffstats
path: root/tests/017/flexstruct.tl
diff options
context:
space:
mode:
authorKaz Kylheku <kaz@kylheku.com>2023-08-22 07:29:56 -0700
committerKaz Kylheku <kaz@kylheku.com>2023-08-22 07:29:56 -0700
commitf0426745961866fa181c9cafd604a18cab4eb143 (patch)
tree2c69af1ec0f621755163f7f97e8bdaea81c90868 /tests/017/flexstruct.tl
parent1a4bdbc566683f0e70412fd75af233526803d0d2 (diff)
downloadtxr-f0426745961866fa181c9cafd604a18cab4eb143.tar.gz
txr-f0426745961866fa181c9cafd604a18cab4eb143.tar.bz2
txr-f0426745961866fa181c9cafd604a18cab4eb143.zip
doc: new hashing scheme for navigation, doc lookup.
This is not an easy change to make because it breaks the validity of existing URLs in the wild which point to specific sections of the TXR manual. Some of my recent changes to capitalization of numerous headings have already broken many URLs, so we might as well bite the bullet and do this now. The problem with the current scheme is that entire section titles are hashed: all the words of a title, not just the names of functions. Whenever we add a new function, macro or variable which is documented together with related functions in the same paragraph under the same heading, the heading changes, and the hash changes. For instance, the hash for the hash-map identifier is actually the hash of the string "Function <tt>hash-map</tt>". Under the new scheme, section titles are hashed in a more complicated way that is robust against most edits. If a title contains any symbols marked up with <tt>, then the leftmost such symbol is taken as the title. Otherwise, the whole title is mapped to lower case. There is no longer a stdlib/doc-syms.tl file, and the special disambiguated "D-<HEX>" codes are also gone. Symbols are no longer associated with section hashes or disambiguation section codes. The hash of a symbol is a 32 bit CRC-32 checksum, expressed as S-<HEX> where <HEX> is 8 hex digits. A section which defines symbols has not only a <a name="..."> for its own hash but also additional <a name="...>" elements for each of the symbols that it defines. If a section defines an ambiguous symbol (one that is also defined with a different meaning in a different section), then that symbol is not linked to either section; it is mapped to the generated disambiguating section. * genman.txr (dupes): Renamed to dupe-hashes for clarity. (tagnum): Hash removed. (direct): New hash. Tracks the assocation between sections hashes and hashes of symbols that are defined only in those symbols (no ambiguity) and thus the symbol hashes can navigate directly to the sections. Serves as a complement to the disamb hash. (colli): There are no collisions now, so initialize this to empty. (hash-str): Function removed. (hash-title): This function becomes more complicated. If a title has at least one <tt>..</tt> item, then that is taken in its place. Either way, the title is transformed and enumerated against duplication and hashed with crc32 instead of the original custom hashing function. (enumerate): Function removed: enumeration of titles is done inside hash-title. All manipulations of symhash using material from HTML now use html-decode, so that we hash the original symbol name like "str<" and not "str&lt;". When filtering the BODY, we have a new case: whenever we see a <a name="...">, we now check the new direct hash to see if there is a list of symbol hashes for the given section. If so, we generate additional <a name="..."> definitions for all the symbol hashes. At the end of the file, the "missing from image" processing is condensed, and the generation of the stdlib/doc-syms.tl file is removed. * stdlib/doc-syms.tl: Removed. * stdlib/doc-lookup.tl: Don't load doc-syms. Use crc32 plus formatting to conver a symbol to the hash that is used in the document and try the lookup with that.
Diffstat (limited to 'tests/017/flexstruct.tl')
0 files changed, 0 insertions, 0 deletions