doc: revised descrs of awk vars fs, ft an kfs.

* txr.1: fs and ft are described in one simplified section. The default behavior when they are both nil is described simply as token extraction, which is how it is now implemented. Some verbiage is reduced in the krs description.
author: Kaz Kylheku <kaz@kylheku.com> 2016-09-21 06:28:30 -0700
committer: Kaz Kylheku <kaz@kylheku.com> 2016-09-21 06:28:30 -0700
commit: 8ef969502489d5b45ae8238800262d696b7aa54b (patch)
tree: eb6984772d69037df176ffacb4e5953498876e11 /txr.1
parent: 77f63c2a9e338207cb1fbe59b410958ce2ecda6d (diff)
download: txr-8ef969502489d5b45ae8238800262d696b7aa54b.tar.gz
txr-8ef969502489d5b45ae8238800262d696b7aa54b.tar.bz2
txr-8ef969502489d5b45ae8238800262d696b7aa54b.zip
1 files changed, 65 insertions, 121 deletions
diff --git a/txr.1 b/txr.1
index 3f682130..eac02f88 100644
--- a/txr.1
+++ b/txr.1
@@ -38064,140 +38064,94 @@ by the pattern in the
 .code rs
 variable is retained as part of the preceding record rather than removed.
 
-.coNP Variable @ fs
+.coNP Variables @ fs and @ ft
 .desc
 The awk variable
 .code fs
-specifies a string or regular expression which is used for
-delimiting records into fields.
-Another variable called
-.code fs
-also specifies a string or regular expression which is used for
-delimiting records into fields in a different way.
-It is an error for both of these variables to simultaneously
-have a value other than
-.codn nil .
-
-If
-.code fs
-is
-.code nil
-and the variable
-.code ft
-isn't, then delimiting is done using the tokenizing logic associated with the
-.code ft
-variable. The remaining description assumes that
+and
 .code ft
-is
-.codn nil .
+each specify a string or regular expression which is used for each
+record that is stored in the
+.code rec
+variable into fields.
 
-The
-.code fs
-variable is initially
-.codn nil .
+Both variables are initialized to
+.codn nil ,
+in which case a default behavior is in effect, described below.
 
-If
-.code fs
-is
+Use of these variable is mutually exclusive; it is an error for both of these
+variables to simultaneously have a value other than
+.codn nil .
+The value stored in either variable must be
 .codn nil ,
-then, prior to field splitting, leading and trailing
-whitespace is trimmed from the value of
-.codn rec ,
-using the
-.code trim-str
-function. The subsequent field splitting operates on this
-trimmed value, which isn't stored back into
-.codn rec .
+a character string or a regular expression. If it contains a string or
+regex, it is said to contain a pattern. A string value effectively behaves
+as a fixed regular expression which matches the sequence of characters
+in the string verbatim, without treating any of them as regex operators.
 
-Regardless of the value of
-.codn fs ,
-a record which is empty (tested after the trimming described above, if that
-takes place) produces no fields:
-.code f
-is the empty list, and
-.code nf
-is zero. However, this behavior is altered by the
+The splitting of
+.code rec
+into fields is influenced by the Boolean
 .code kfs
-variable.
-
+("keep field separators")
+variable, whose effect is discussed in its description.
 If
-.code fs
-is nil, then the splitting into fields is performed as if
-the variable held the regular expression
-.codn "/[\en\et ]+/" .
-This means that, by default, fields are separated by one or more consecutive
-whitespace characters, which can be any mixture of spaces, tabs or newlines.
-Newlines are included because they can occur in a record when the value of the
-record separator
-.code rs
-is customized.
-
-If
-.code fs
-is not
-.codn nil ,
-it must specify a string, or a regular expression.
-A string value of
-.code fs
-denotes an exact match for that string; it isn't treated
-as a regular expression.
-
-When a record is not empty,
-matches for the
-.code fs
-pattern are identified in it, and those matching parts separate fields:
-the fields are the possibly empty non-matching parts between the matches.
-It is possible to keep the non-matching parts as fields also, by
-setting the
 .code kfs
-variable.
+is false, the splitting is carried out as follows.
 
 If
 .code fs
-is not found in the record, then the entire record is taken as a one
-field.
-
-If
+contains a pattern, then
+.code rec
+is treated specially when it is the empty string: in that case,
+the pattern in
 .code fs
-finds only an empty string match in the record, then it is considered
-to match each of the empty strings between the characters. Consequently,
-the record is split into its individual characters, each one becoming
-a field.
-
-.coNP Variable @ ft
-.desc
-The awk variable
+is ignored, and no fields are produced: the field list
+.code f
+is the empty list, and
+.code nf
+is zero.  A non-empty record is split by searching it for matches for the
 .code fs
-specifies a string or regular expression which is used for
-delimiting records into fields. Its initial value is
-.codn nil ,
-and in that state, it is not active. It is an error
-for both
+pattern. If a match does not occur, then the entire record is a field.
+If one match occurs, then the record is split into two fields, either of which,
+or both, might be empty. If two matches occur, the record is split into
+three fields, and so on. If
 .code fs
-and
-.code ft
-to both be set to a value which is not
-.codn nil .
+finds only an empty string match in the record, then it is considered
+to match each of the empty strings between two consecutive characters of the
+record. Consequently, the record is split into its individual characters, each
+one becoming a field. Note: all of these behaviors, except for the special
+treatment of the empty record, are accomplished by a call to the
+.code split-str
+function.
 
-The
+If the variable
 .code ft
-variable, if not
-.codn nil ,
-must be set to a regular expression value.
-
-It specifies a pattern which is used to positively recognize tokens within the
-input record, rather than to match separating material between them.
-
-Tokens do not have to be consecutive; non matching material between them
-is skipped. The skipped material can be be retained and turned into
-fields, by setting the
-.code kfs
-variable.
-
+("field tokenize") contains a pattern, that pattern is used to positively
+recognize tokens within the input record, rather than to match separating
+material between them. Those matching tokens then constitute the fields.
 The tokenizing is performed using the
 .code tok-str
 function.
 
+If
+.code fs
+and
+.code ft
+are both
+.codn nil ,
+as is initially the case, then the splitting into fields is performed
+as if the
+.code ft
+variable held the regular expression
+.codn "/[^\en\et ]+/" .
+This means that, by default, fields are sequences of consecutive characters
+which are not spaces, tabs or newlines.
+Newlines are excluded from fields (and thus separate them) because they can
+occur in a record when the value of the record separator
+.code rs
+is customized.
+
 .coNP Variable @ kfs
 .desc
 The awk variable
@@ -38213,16 +38167,6 @@ in which they were extracted from the record.
 
 When
 .code kfs
-is set, it prevents the behavior of an empty record
-automatically giving rise to zero fields. Empty records are
-still split or tokenized according to
-.code fs
-or
-.codn ft ,
-respectively.
-
-When
-.code kfs
 is set, and tokenization-style delimiting is in effect due to
 .code ft
 being set, there is always at least one field, even if the record is empty.
author	Kaz Kylheku <kaz@kylheku.com>	2016-09-21 06:28:30 -0700
committer	Kaz Kylheku <kaz@kylheku.com>	2016-09-21 06:28:30 -0700
commit	8ef969502489d5b45ae8238800262d696b7aa54b (patch)
tree	eb6984772d69037df176ffacb4e5953498876e11 /txr.1
parent	77f63c2a9e338207cb1fbe59b410958ce2ecda6d (diff)
download	txr-8ef969502489d5b45ae8238800262d696b7aa54b.tar.gz txr-8ef969502489d5b45ae8238800262d696b7aa54b.tar.bz2 txr-8ef969502489d5b45ae8238800262d696b7aa54b.zip