* Makefile (%.ok: %.txr): Use unified diff for showing

differences between expected and actual test output. * parser.l (yybadtoken): Handle new terminal symbol, SPACE. New rule for producing SPACE token out of an extent of tabs and spaces. * parser.y (SPACE): New terminal symbol. (o_var): New nonterminal. I noticed that the var rule was being used for output elements, and the var rule refers to elem rather than o_elem. A new o_var rule is a simplified duplicate of var. (elem): Handle SPACE token. Transform to regex if it is a single space, otherwise to literal text. (o_elem): Handle SPACE token in output. * tests/001/query-2.txr: This query depends on matching single spaces and so needs to use escapes. * tests/001/query-4.txr, test/001/query-4.expected: New test case, based on query-2.txr. It produces the same output, but is simpler thanks to the new semantics of space. * txr.1: Documented.
author: Kaz Kylheku <kaz@kylheku.com> 2011-10-13 08:41:56 -0700
committer: Kaz Kylheku <kaz@kylheku.com> 2011-10-13 08:41:56 -0700
commit: 35eb4dbc80f857007f99278c48e22f8557e13b68 (patch)
tree: 615cedd436b39374d7160433734f91f53f735aa6 /txr.1
parent: 6e23d00099312495addb7639d7062aab71a9cbfe (diff)
download: txr-35eb4dbc80f857007f99278c48e22f8557e13b68.tar.gz
txr-35eb4dbc80f857007f99278c48e22f8557e13b68.tar.bz2
txr-35eb4dbc80f857007f99278c48e22f8557e13b68.zip
1 files changed, 26 insertions, 1 deletions
diff --git a/txr.1 b/txr.1
index b9dbe78b..e88dd7bf 100644
--- a/txr.1
+++ b/txr.1
@@ -32,7 +32,7 @@ txr \- text extractor (version 039)
 is a query tool for extracting pieces of text buried in one or more text
 file based on pattern matching.  A
 .B txr
-query specifies a pattern which matches (a prefix of) entire file, or
+query specifies a pattern which matches (a prefix of) an entire file, or
 multiple files. The pattern is matched against the material in the files, and
 free variables occurring in the pattern are bound to the pieces of text
 occurring in the corresponding positions. If the overall match is
@@ -312,6 +312,31 @@ However, if the hash bang line can use the -f option:
 Now, the name of the script is passed as an argument to the -f option,
 and txr will look for more options after that.
 
+.SS Whitespace
+
+Outside of directives, whitespace is significant in TXR queries, and represents
+a pattern match for whitespace in the input.  An extent of text consisting of
+an undivided mixture of tabs and spaces is a whitespace token.  
+
+Whitespace tokens match a precisely identical piece of whitespace in the input,
+with one exception: a whitespace token consisting of precisely one space has a
+special meaning. It is equivalent to the regular expression @/[ \t]+/: match
+one or more tabs or spaces.
+
+Thus, the query line "a b" (one space) matches texts like "a b", "a   b", et
+cetera (arbitrary number of tabs and spaces between a and b).  However "a  b"
+(two spaces) matches only "a  b" (two spaces).
+
+For matching a single space, the syntax @\ can be used (backslash-escaped
+space).
+
+It is more often necessary to match multiple spaces, than to exactly
+match one space, so this rule simplifies many queries and adds inconvenience
+to only few.
+
+In output clauses, string and character literals and quasiliterals, a space
+token denotes a space.
+
 .SS Text
 
 Query material which is not escaped by the special character @ is
author	Kaz Kylheku <kaz@kylheku.com>	2011-10-13 08:41:56 -0700
committer	Kaz Kylheku <kaz@kylheku.com>	2011-10-13 08:41:56 -0700
commit	35eb4dbc80f857007f99278c48e22f8557e13b68 (patch)
tree	615cedd436b39374d7160433734f91f53f735aa6 /txr.1
parent	6e23d00099312495addb7639d7062aab71a9cbfe (diff)
download	txr-35eb4dbc80f857007f99278c48e22f8557e13b68.tar.gz txr-35eb4dbc80f857007f99278c48e22f8557e13b68.tar.bz2 txr-35eb4dbc80f857007f99278c48e22f8557e13b68.zip