diff options
author | Kaz Kylheku <kaz@kylheku.com> | 2016-08-31 06:50:34 -0700 |
---|---|---|
committer | Kaz Kylheku <kaz@kylheku.com> | 2016-08-31 06:50:34 -0700 |
commit | 5adc28c952bbfde879b4032d3ad72694378c9838 (patch) | |
tree | 2f520dd585b6bbda148df9b784e43957741e790b /txr.1 | |
parent | a9396d16884429b486e636a2e40ecc2e8c1a05a9 (diff) | |
download | txr-5adc28c952bbfde879b4032d3ad72694378c9838.tar.gz txr-5adc28c952bbfde879b4032d3ad72694378c9838.tar.bz2 txr-5adc28c952bbfde879b4032d3ad72694378c9838.zip |
doc: wording changes regarding usage of "query".
* txr.1: The term "query language" is retired; "pattern
language" is used everywhere. The script argument can be TXR
Lisp or TXR, so is referred to as "script-file" in all
contexts where it could be either. Clarifications are added in
a few places that the script could be Lisp or that some
wording only applies when the script is TXR. Removing
incorrect, obsolescent wording which specifies that the
leading exclamation mark convention is honored in a file name
argument.
Diffstat (limited to 'txr.1')
-rw-r--r-- | txr.1 | 105 |
1 files changed, 64 insertions, 41 deletions
@@ -349,14 +349,19 @@ .cble .SH* DESCRIPTION -\*(TX is a language oriented toward processing text from files or streams, using -multiple programming paradigms. - -A \*(TX script is called a query, and it specifies a pattern which matches (a -prefix of) an entire file, or multiple files. Patterns can consists of large -chunks of multi-line free-form text, which is matched literally against -material in the input sources. Free variables occurring in the pattern -(denoted by the +\*(TX is a language oriented toward processing text from files or streams, +supporting multiple programming paradigms. +It is a combination of two programming languages: an text scanning +and extraction language referred to as the \*(TX pattern language, or +sometimes just \*(TX when it is clear, and a general-purpose dialect of Lisp +called \*(TL. + +A script written in the \*(TX pattern language is referred to in this +document as a query, and it +specifies a pattern which matches (a prefix of) an entire file, or multiple +files. Patterns can consists of large chunks of multi-line free-form text, +which is matched literally against material in the input sources. Free +variables occurring in the pattern (denoted by the .code @ symbol) are bound to the pieces of text occurring in the corresponding positions. If the overall match is successful, then @@ -371,9 +376,8 @@ recursive. \*(TX patterns can work horizontally (characters within a line) or vertically (spanning multiple lines). Multiple lines can be treated as a single line. - In addition to embedded variables which implicitly match text, the -\*(TX query language supports a number of directives, for matching text using +\*(TX pattern language supports a number of directives, for matching text using regular expressions, for continuing a match in another file, for searching through a file for the place where an entire sub-query matches, for collecting lists, and for combining sub-queries using logical conjunction, disjunction and @@ -552,7 +556,9 @@ the dimension order is: .meIP -c < query Specifies the query in the form of a command line argument. If this option is -used, the query-file argument is omitted. The first non-option argument, +used, the +.meta script-file +argument is omitted. The first non-option argument, if there is one, now specifies the first input source rather than a query. Unlike queries read from a file, (non-empty) queries specified as arguments using -c do not have to properly end in a newline. Internally, @@ -607,9 +613,9 @@ comment syntax can be used for better formatting: .cble .RE -.meIP -f < query-file +.meIP -f < script-file Specifies the file from which the query is to be read, instead of the -.meta query-file +.meta script-file argument. This is useful in .code #! ("hash bang") scripts. (See Hash Bang Support below). @@ -617,13 +623,13 @@ argument. This is useful in .meIP -e < expression Evaluates a \*(TL expression for its side effects, without printing its value. Can be specified more than once. The -.meta query-file +.meta script-file argument becomes optional if .code -e is used at least once. If the evaluation of every .meta expression evaluated this way terminates normally, and there is no -.meta query-file +.meta script-file argument, then \*(TX terminates with a successful status. .meIP -p < expression @@ -819,7 +825,9 @@ if another argument looks like an option, it is treated as a name. This special argument .code - means "read from standard input" instead of a file. -The query file, or any of the data files, may be specified using this option. +The +.metn script-file , +or any of the data files, may be specified using this option. If two or more files are specified as .codn - , the behavior is system-dependent. @@ -828,34 +836,36 @@ then specify more input which is interpreted as the second file, and so forth. .PP After the options, the remaining arguments are files. The first file argument -specifies the query, and is mandatory if the +specifies the script file, and is mandatory if the .code -f -option has not been specified. A file argument consisting of a single +option has not been specified, and \*(TX isn't operating in interactive +mode or evaluating expressions from the command line via +.code -e +or one of the related options. A file argument consisting of a single .code - -means to read the standard input instead of opening a file. A file argument -which begins with an exclamation symbol means that the rest of the argument is -a shell command which is to be run as a coprocess, and its output read like a -file. +means to read the standard input instead of opening a file. .PP -\*(TX begins by reading the query. The entire query is scanned, internalized -and then begins executing, if it is free of syntax errors. The reading of -data, on the other hand, is lazy. A file isn't opened until the query demands -material from that file, and then the contents are read on demand, not all at -once. - -The suffix of the query file is significant. If the query file name has no -suffix, or if it has a +\*(TX begins by reading the script. In the case of the \*(TX pattern language, +the entire query is scanned, internalized and then begins executing, if it is +free of syntax errors. (\*(TL is processed differently, form by form). On the +other hand, the pattern language reads data files in a lazy manner. A file +isn't opened until the query demands material from that file, and then the +contents are read on demand, not all at once. + +The suffix of the +.meta script-file +is significant. If the name has no suffix, or if it has a .str .txr -suffix, then it is assumed to be in the \*(TX query language. If it has +suffix, then it is assumed to be in the \*(TX pattern language. If it has the .str .tl suffix, then it is assumed to be \*(TL. The .code --lisp -option changes the treatment of unsuffixed query file names, causing them +option changes the treatment of unsuffixed script file names, causing them to be interpreted as \*(TL . -If an unsuffixed query file name is specified, and cannot be opened, then +If an unsuffixed script file name is specified, and cannot be opened, then \*(TX will add the .str .txr suffix and try again. If that fails, it will be tried with the @@ -875,8 +885,8 @@ the \*(TX process or throw an exception, and there are no syntax errors, then are encountered in a form, then \*(TX terminates unsuccessfully. \*(TL is documented in the section TXR LISP. -If no files arguments are specified on the command line, it is up to the -query to open a file, pipe or standard input via the +If a query file is specified, but no file arguments, +it is up to the query to open a file, pipe or standard input via the .code @(next) directive prior to attempting to make a match. If a query attempts to match text, @@ -892,8 +902,13 @@ bindings with or .codn -a . -If the command line arguments are incorrect, or the query has a malformed -syntax, \*(TX issues an error diagnostic and terminates with a failed status. +If the command line arguments are incorrect, \*(TX issues an error diagnostic +and terminates with a failed status. + +If the +.meta script-file +specifies a query, and the query has a malformed syntax, \*(TX likewise +issues error diagnostics and terminates with a failed status. If the query fails due to a mismatch, \*(TX terminates with a failed status. No diagnostics are issued. @@ -916,6 +931,14 @@ if the query fails, and exits with a failed termination status. If the query succeeds, the variable bindings, if any, are output on standard output. +If the +.meta script-file +is \*(TL, then it is processed form by form. Each top-level Lisp form +is evaluated after it is read. If any form is syntactically malformed, +\*(TX issues diagnostics and terminates unsuccessfully. This is somewhat +different from how the pattern language is treated: a script in the pattern +language is parsed in its entirety before being executed. + .SH* BASIC TXR SYNTAX .SS* Comments A query may contain comments which are delimited by the sequence @@ -1347,8 +1370,8 @@ in .SS* Character Handling and International Characters \*(TX represents text internally using wide characters, which are used to -represent Unicode code points. The query language, as well as all data sources, -are assumed to be in the UTF-8 encoding. In the query language, extended +represent Unicode code points. Script source code, as well as all data sources, +are assumed to be in the UTF-8 encoding. In \*(TX and \*(TL source, extended characters can be used directly in comments, literal text, string literals, quasiliterals and regular expressions. Extended characters can also be expressed indirectly using hexadecimal or octal escapes. @@ -42122,7 +42145,7 @@ If .meta target has a .str .txr -suffix, it is assumed to be a \*(TX query language file, and +suffix, it is assumed to be a \*(TX pattern language file, and an exception of type .code eval-error is thrown, since this is not supported. @@ -43819,7 +43842,7 @@ In \*(TX 124 and earlier versions, the .code @(next) directive didn't evaluate the .meta source -argument as a Lisp expression, but as a \*(TX Pattern Language +argument as a Lisp expression, but as a \*(TX pattern language expression. Lisp expressions thus had to be delimited by .codn @ . The current behavior is that the argument is treated as Lisp. |