summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorKaz Kylheku <kaz@kylheku.com>2016-09-22 21:21:23 -0700
committerKaz Kylheku <kaz@kylheku.com>2016-09-22 21:21:23 -0700
commit0bd7556d869f0038a5bf58fff41ccbed891d150d (patch)
treec41f9211e32ae4e51aae97a41223008ea552e4ea
parente9db3a4cf7a6a292e74a8d77868eb356107591ca (diff)
downloadtxr-0bd7556d869f0038a5bf58fff41ccbed891d150d.tar.gz
txr-0bd7556d869f0038a5bf58fff41ccbed891d150d.tar.bz2
txr-0bd7556d869f0038a5bf58fff41ccbed891d150d.zip
Fix match-regex not conforming to documentation.
The documentation says that match-regex returns the length. Actually, it returns the position after the last character matched. This makes a difference when the match doesn't begin at character zero. The actual behavior is that of the match_regex C function which has behaved that way since the dawn of TXR, and internals depend on it behaving that way. So the internal function is being retained, and a new function is being registered as the match-regex intrinsic. The choice of binding for match-regex is subject to the compatibility option. The behavior of match-regst is also being fixed since its return value is incorrect due to this issue. Since its return value makes no sense at all (does not represent the matched text), it is not subject to the compatibility option; it is just fixed to conform with the documentation. * regex.c (match_regex_len): New function. (match_regst): Keep using match_regex, but use its return value properly. This simplifies the range extraction code, which is why match_regex works that way in the first place. (regex_init): Register match-regex to match_regex_len, unless compatibility <= 150 is requested; then register to match_regex. * regex.h (match_regex_len): Declared. * txr.1: Compatibility notes added.
-rw-r--r--regex.c18
-rw-r--r--regex.h1
-rw-r--r--txr.113
3 files changed, 29 insertions, 3 deletions
diff --git a/regex.c b/regex.c
index ba63132c..8368a8b8 100644
--- a/regex.c
+++ b/regex.c
@@ -2496,6 +2496,16 @@ val match_regex(val str, val reg, val pos)
return nil;
}
+val match_regex_len(val str, val regex, val pos)
+{
+ if (null_or_missing_p(pos)) {
+ return match_regex(str, regex, pos);
+ } else {
+ val new_pos = match_regex(str, regex, pos);
+ return if2(new_pos, minus(new_pos, pos));
+ }
+}
+
val match_regex_right(val str, val regex, val end)
{
val pos = zero;
@@ -2580,8 +2590,8 @@ val search_regst(val haystack, val needle_regex, val start_num, val from_end)
val match_regst(val str, val regex, val pos_in)
{
val pos = default_arg(pos_in, zero);
- val len = match_regex(str, regex, pos);
- return if2(len, sub_str(str, pos, plus(pos, len)));
+ val new_pos = match_regex(str, regex, pos);
+ return if2(new_pos, sub_str(str, pos, new_pos));
}
val match_regst_right(val str, val regex, val end)
@@ -2751,7 +2761,9 @@ void regex_init(void)
reg_fun(intern(lit("search-regex"), user_package), func_n4o(search_regex, 2));
reg_fun(intern(lit("range-regex"), user_package), func_n4o(range_regex, 2));
reg_fun(intern(lit("search-regst"), user_package), func_n4o(search_regst, 2));
- reg_fun(intern(lit("match-regex"), user_package), func_n3o(match_regex, 2));
+ reg_fun(intern(lit("match-regex"), user_package),
+ func_n3o((opt_compat && opt_compat <= 150) ?
+ match_regex : match_regex_len, 2));
reg_fun(intern(lit("match-regst"), user_package), func_n3o(match_regst, 2));
reg_fun(intern(lit("match-regex-right"), user_package),
func_n3o(match_regex_right, 2));
diff --git a/regex.h b/regex.h
index 599e985e..80494860 100644
--- a/regex.h
+++ b/regex.h
@@ -34,6 +34,7 @@ val regexp(val);
val search_regex(val haystack, val needle_regex, val start_num, val from_end);
val range_regex(val haystack, val needle_regex, val start_num, val from_end);
val match_regex(val str, val regex, val pos);
+val match_regex_len(val str, val regex, val pos);
val match_regex_right(val str, val regex, val end);
val search_regst(val haystack, val needle_regex, val start_num, val from_end);
val match_regst(val str, val regex, val pos);
diff --git a/txr.1 b/txr.1
index c0c2758d..60cbee60 100644
--- a/txr.1
+++ b/txr.1
@@ -45702,6 +45702,19 @@ of these version values, the described behaviors are provided if
is given an argument which is equal or lower. For instance
.code "-C 103"
selects the behaviors described below for version 105, but not those for 102.
+.IP 150
+Until version 150, the
+.code match-regex
+function behaved in a different way from what was documented. Rather
+than returning the length of the match, it returned the index one
+past the last matching character. In the case when the starting position
+is zero, these values coincide; they are different if the match begins
+at some position inside the string. Compatibility with 150 restores
+the behavior. The
+.code match-regst
+function was also affected by this issue; however, since it returned nonsense
+result not corresponding to the matching text, it was repaired without
+backward compatibility.
.IP 148
Up until version 148, the
.code :postinit