summaryrefslogtreecommitdiffstats
path: root/txr.1
diff options
context:
space:
mode:
Diffstat (limited to 'txr.1')
-rw-r--r--txr.19
1 files changed, 8 insertions, 1 deletions
diff --git a/txr.1 b/txr.1
index dc692dd2..d69b8645 100644
--- a/txr.1
+++ b/txr.1
@@ -478,7 +478,7 @@ does not split the line into two; it's embedded into the line and
thus cannot match anything. However, @\en may be useful in the @(cat)
directive and in @(output).
-.SS International Characters
+.SS Character Handling and International Characters
.B TXR
represents text internally using wide characters, which are used to represent
@@ -519,6 +519,13 @@ mapping it to the Unicode character range U+DC00 through U+DCFF. The decoding
resumes afresh at the following byte, expecting that byte to be the start
of a UTF-8 code.
+Furthermore, because TXR internally uses a null-terminated character
+representation of strings which easily interoperates with C language
+interfaces, when a null character is read from a stream, TXR converts it to
+the code U+DC00. On output, this code converts back to a null byte,
+as explained in the previous paragraph. By means of this representational
+trick, TXR can handle textual data containing null bytes.
+
.SS Regular Expression Directives
In place of a piece of text (see section Text above), a regular expression