summaryrefslogtreecommitdiffstats
path: root/txr.1
diff options
context:
space:
mode:
authorKaz Kylheku <kaz@kylheku.com>2021-05-29 13:03:47 -0700
committerKaz Kylheku <kaz@kylheku.com>2021-05-29 13:03:47 -0700
commitab99601555d48297af0897c022a8288283318100 (patch)
tree6ea988e2530fc51b731cb66df0283c51645e3aff /txr.1
parent12800700f93639c259757f0f9def1546d215ee95 (diff)
downloadtxr-ab99601555d48297af0897c022a8288283318100.tar.gz
txr-ab99601555d48297af0897c022a8288283318100.tar.bz2
txr-ab99601555d48297af0897c022a8288283318100.zip
json: functions put-json and put-jsonl.
* eval.c (eval_init): Register put-json and put-jsonl intrinsics. * lib.c (out_json_str): Do not output the U+DC01 to U+DCFF code points by masking them and using put_byte. This is unnecessary; if we just send them as-is to the text stream, the UTF-8 encoder does that for us. (put_json, put_jsonl): New functions. * lib.h (put_json, put_jsonl): Declared. * txr.1: Documented. The bulk of tojson is moved under the descriptions of these new functions, and elsewhere where the document pointed to tojson for more information, it now points to put-json. More detailed description of character treatment is given. * share/txr/stdlib/doc-syms.tl: Updated.
Diffstat (limited to 'txr.1')
-rw-r--r--txr.1114
1 files changed, 90 insertions, 24 deletions
diff --git a/txr.1 b/txr.1
index cf692fd5..1f764f6b 100644
--- a/txr.1
+++ b/txr.1
@@ -12420,7 +12420,7 @@ expression is evaluated.
The following remarks indicate special treatment and extensions in the
processing of JSON. Similar remarks regarding the production of JSON are
given under the
-.code tojson
+.code put-json
function.
When an invalid UTF-8 byte is encountered inside a JSON string, its value is
@@ -71843,16 +71843,25 @@ etc.
.SS* Data Interchange Support
-.coNP Function @ tojson
+.coNP Functions @ put-json and @ put-jsonl
.synb
-.mets (tojson < obj <> [ flat-p ])
+.mets (put-json < obj >> [ stream <> [ flat-p ]])
+.mets (put-jsonl < obj >> [ stream <> [ flat-p ]])
.syne
.desc
The
-.code tojson
+.code put-json
function converts
.meta obj
-into JSON notation, returned as a character string.
+into JSON notation, and writes that notation into
+.meta stream
+as a sequence of characters.
+
+If
+.meta stream
+is an external stream such as a file stream, then the JSON is
+rendered by conversion of the characters into UTF-8, in the usual
+manner characteristic of those streams.
The behavior is unspecified if
.meta obj
@@ -71891,37 +71900,94 @@ is produced, since RFC 8259 requires JSON object keys to be strings.
If the
.code flat-p
argument is present and has a true value, then the JSON is generated
-without any line breaks or indentation.
+without any line breaks or indentation. Otherwise, the JSON output is subject
+to such formatting.
-Otherwise, the JSON is potentially subject to such formatting.
-
-Even if the JSON data contains line breaks, it does not end in a line break.
+The difference between
+.code put-json
+and
+.code put-jsonl
+is that the latter emits a newline character after the JSON output.
-When a JSON string is output, any code points U+DC01 through U+DCFF occurring
-in that string are assumed to denote raw bytes to be output, without
-escaping. The code point U+DC00 produces the
-.code "\eu0000"
-escape syntax. This behavior is different from \*(TL literals, which, on
-output, simply render these code points using
-.code "\ex"
-escape sequences. Rationale: this is because JSON is considered an external format.
-The requirements are intended to reproduce the original byte sequence, if
-possible, rather than JSON syntax which will produce the same \*(TX object
-if read back by \*(TX.
+When a string object is output as JSON string syntax, the following rules
+.RS
+.IP 1.
+The characters
+.code \e
+(backslash, reverse solidus) and
+.code \(dq
+(double quote)
+are preceded by a backslash escape.
+.IP 2.
+The characters U+0008 (BS), U+0009 (TAB), U+000A (LF), U+000C (FF) and
+U+000D (CR) are rendered as, respectively,
+.codn \eb ,
+.codn \et ,
+.codn \en ,
+.code \ef
+and
+.codn \er .
+.IP 3.
If the character sequence
.code "</"
-occurs in the string, the slash is escaped, such that the sequence
-is rendered as
+occurs in a string, then in the JSON representation the slash is escaped, such
+that the sequence is rendered as
.codn "<\e/" .
Instances of the
.code /
(forward slash, solidus) occurs not preceded by
.code <
-(less than) are unescaped. Rationale: this allows for safe embedding
-of the resulting JSON into HTML
+(less than) are unescaped. Rationale: this is a feature of JSON which allows
+for safer embedding of the resulting JSON into HTML
.code script
tags.
+.IP 4.
+The code point U+DC00 (\*(TX's pseudo-null character) is translated into the
+.code "\eu0000"
+escape syntax.
+.IP 4.
+The code points U+DC01 through U+DCFF are send to the stream as-is.
+If the stream performs UTF-8 encoding, these characters turn into individual
+bytes in the range 0 to 255.
+.IP 5.
+Control characters in the U+0001 to U+001F other than the ones subject
+to rule 1 above are rendered as
+.code \eu
+escape sequences. Likewise, code points in the range U+0080 to U+00BF,
+the range U+D800 to U+DBFF, U+DD00 to U+DFFF, and the code points
+U+FFFE and U+FFFF are also encoded as
+.code \eu
+escape sequences.
+.IP 6.
+A character outside of the BMP (Basic Multilingual Plane) in the range
+U+10000 to U+10FFFF is encoded using as a pair of consecutive
+.code \eu
+escape sequences, specifying the code points of a UTF-16 surrogate pair
+encoding that character. This representation is described in RFC 8259.
+.RE
+
+.coNP Function @ tojson
+.synb
+.mets (tojson < obj <> [ flat-p ])
+.syne
+.desc
+The
+.code tojson
+function converts
+.meta obj
+into JSON notation, returned as a character string.
+
+The function can be understood as constructing a string output stream,
+calling the
+.code put-json
+function to write the object into that stream,
+and then retrieving and returning the constructed string.
+
+The
+.meta flat-p
+argument is passed to
+.codn put-json .
.coNP Function @ get-json
.synb