diff options
author | Kaz Kylheku <kaz@kylheku.com> | 2021-04-09 06:53:47 -0700 |
---|---|---|
committer | Kaz Kylheku <kaz@kylheku.com> | 2021-04-09 06:53:47 -0700 |
commit | 430aefc7e00fc1347534e0287846bd1e1950f425 (patch) | |
tree | 7fc9a65e70ee2812a30a834e14a664585ab31f02 /txr.1 | |
parent | a69ee0c1bc34ef9d37cc837df6c206651d816513 (diff) | |
download | txr-430aefc7e00fc1347534e0287846bd1e1950f425.tar.gz txr-430aefc7e00fc1347534e0287846bd1e1950f425.tar.bz2 txr-430aefc7e00fc1347534e0287846bd1e1950f425.zip |
doc: more details in string literals section.
* txr.1: advise user that numeric escapes in string literals
are not byte-wise, but specify code points.
Diffstat (limited to 'txr.1')
-rw-r--r-- | txr.1 | 14 |
1 files changed, 14 insertions, 0 deletions
@@ -3011,6 +3011,20 @@ as a delimiter. Thus, represents .strn "!;" . +Note: strings in \*(TX consist of Unicode code points, not UTF-8 bytes; +therefore the elements of a string literal notation cannot specify individual +bytes. Each instance of hexadecimal or octal escape specifies a code point, +even if its value lies in the 8 bit range. +However, when a \*(TX string is encoded to UTF-8, +every code point in the range U+DC00 through U+DCFF is converted to a +a single byte, by taking the low-order eight bits of its value. By manipulating +code points in this special range, \*(TX programs can output arbitrary binary +data into text streams. Also note that the +.code \eu +escape sequence for specifying code points found in some languages is +unnecessary and absent. More detailed information is given in the section +Character Handling and International Characters. + If the line ends in the middle of a literal, it is an error, unless the last character is a backslash. This backslash is a special escape which does not denote a character; rather, it indicates that the string literal continues |