summaryrefslogtreecommitdiffstats
path: root/regex.c
Commit message (Collapse)AuthorAgeFilesLines
* Copyright year bump 2021.Kaz Kylheku2021-01-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * METALICENSE: 2020 copyrights bumped to 2021. Added note about SHA-256 routines from Colin Percival. * LICENSE, LICENSE-CYG, Makefile, alloca.h, args.c, args.h, arith.c, arith.h, buf.c, buf.h, cadr.c, cadr.h, chksum.c, chksum.h, chksums/crc32.c, chksums/crc32.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lex.yy.c.shipped, lib.c, lib.h, linenoise/linenoise.c, linenoise/linenoise.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, protsym.c, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/asm.tl, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/compiler.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/copy-file.tl, share/txr/stdlib/debugger.tl, share/txr/stdlib/defset.tl, share/txr/stdlib/doloop.tl, share/txr/stdlib/each-prod.tl, share/txr/stdlib/error.tl, share/txr/stdlib/except.tl, share/txr/stdlib/ffi.tl, share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/keyparams.tl, share/txr/stdlib/op.tl, share/txr/stdlib/package.tl, share/txr/stdlib/param.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/pmac.tl, share/txr/stdlib/quips.tl, share/txr/stdlib/save-exe.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/stream-wrap.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/trace.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/vm-param.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, time.c, time.h, tree.c, tree.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h, win/cleansvg.txr, y.tab.c.shipped: Copyright year bumped to 2021.
* New functions trim-left and trim-right.Kaz Kylheku2020-10-051-0/+28
| | | | | | | | | * regex.c (trim_left, trim_right): New static functions. (regex_init): New intrinsics registered. * tests/015/trim.tl, tests/015/trim.expected: New files. * txr.1: Documented.
* c_num: now takes self argument.Kaz Kylheku2020-06-291-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The c_num and c_unum functions now take a self argument for identifying the calling function. This requires changes in a large number of places. In a few places, additional functions acquire a self argument. The ffi module has the most extensive example of this. Some functions mention their name in a larger string, or have scattered literals giving their name; with the introduction of the self local variable, these are replaced by references to self. In the following changelog, the notation TS stands for "take self argument", meaning that the functions acquires a new "val self" argument. The notation DS means "define self": the functions in question defines a self variable, which they pass down. The notation PS means that the functions pass down an existing self variable to functions that now require it. * args.h (args_count): TS. * arith.c (c_unum, c_num): TS. (toint, exptv): DS. * buf.c (buf_check_len, buf_check_alloc_size, buf_check_index, buf_do_set_len, replace_buf, buf_put_buf, buf_put_i8, buf_put_u8, buf_put_char, buf_put_uchar, buf_get_bytes, buf_get_i8, buf_get_u8, buf_get_cptr, buf_strm_get_byte_callback, buf_strm_unget_byte, buf_swap32, str_buf, buf_int, buf_uint, int_buf, uint_buf): PS. (make_duplicate_buf, buf_shrink, sub_buf, buf_print, buf_pprint): DS. * chskum.c (sha256_stream_impl, sha256_buf, crc32_buf, md5_stream_impl, md5_buf): TS. (chksum_ensure_buf, sha256_stream, sha256, sha256_hash, md5_stream, md5, md5_hash): PS. (crc32_stream): DS. * combi.c (perm_while_fun, perm_gen_fun_common, perm_str_gen_fun, rperm_gen_fun, comb_vec_gen_fun, comb_str_gen_fun, rcomb_vec_gen_fun, rcomb_str_gen_fun): DS. * diff.c (dbg_clear, dbg_set, dbg_restore): DS. * eval.c (do_eval, gather_free_refs, maprodv, maprendv, maprodo, do_args_apf, do_args_ipf): DS. (op_dwim, me_op, map_common): PS. (prod_common): TS. * ffi.c (struct txr_ffi_type): release member TS. (make_ffi_type_pointer): PS and release argument TS. (ffi_varray_dynsize, ffi_array_in, ffi_array_put_common, ffi_array_get_common, ffi_varray_in, ffi_varray_null_term): PS. (ffi_simple_release, ffi_ptr_in_release, ffi_struct_release, ffi_wchar_array_get, ffi_array_release_common, ffi_array_release, ffi_varray_release): TS. (ffi_float_put, double_put, ffi_be_i16_put, ffi_be_u16_put, ffi_le_i16_put, ffi_le_u16_put, ffi_be_i32_put, ffi_be_u32_put, ffi_le_i32_put, ffi_sbit_put, ffi_ubit_put, ffi_buf_d_put, make_ffi_type_array, make_ffi_type_enum, ffi_type_compile, make_ffi_type_desc, ffi_make_call_desc, ffi_call_wrap, ffi_closure_dispatch_save, ffi_put_into, ffi_in, ffi_get, ffi_put, carray_set_length, carray_blank, carray_buf, carray_buf_sync, carray_cptr, carray_refset, carray_sub, carray_replace, carray_uint, carray_int): PS. (carray_vec, carray_list): DS. * filter.c (url_encode, url_decode, base64_stream_enc_impl): DS. * ftw.c (ftw_callback, ftw_wrap): DS. * gc.c (mark_obj, gc_set_delta): DS. * glob.c (glob_wrap): DS. * hash.c (equal_hash, eql_hash, eq_hash, do_make_hash, hash_equal, set_hash_traversal_limit, gen_hash_seed): DS. * itypes.c (c_i8, c_u8, c_i16, c_u16, c_i32, c_u32, c_i64, c_u64, c_short, c_ushort, c_int, c_uint, c_long, c_ulong): PS. * lib.c (seq_iter_rewind): TS and becomes internal. (seq_iter_init_with_info, seq_setpos, replace_str, less, replace_vec, diff, isec, obj_print_impl): PS. (nthcdr, equal, mkstring, mkustring, upcase_str, downcase_str, search_str, sub_str, cat_str, scat2, scat3, fmt_join, split_str_keep, split_str_set, trim_str, int_str, chr_int, chr_str, chr_str_set, vector, vecref, vecref_l, list_vec, copy_vec, sub_vec, cat_vec, lazy_str_put, lazy_str_gt, length_str_ge, length_str_lt, length_str_le, cptr_size_hint, cptr_int, out_lazy_str, out_quasi_str, time_string_local_time, time_string_utc, time_fields_local_time, time_fields_utc, time_struct_local, time_struct_utc, make_time, time_meth, time_parse_meth): DS. (init_str, cat_str_init, cat_str_measure, cat_str_append, vscat, time_fields_to_tm, time_struct_to_tm, make_time_impl): TS. * lib.h (seq_iter_rewind): Declaration removed. (c_num, c_unum, init_str): Declarations updated. * match.c (LOG_MISMATCH, LOG_MATCH): PS. (h_skip, h_coll, do_output_line, do_output, v_skip, v_fuzz, v_collect): DS. * parser.c (parser, circ_backpatch, report_security_problem, hist_save, repl, lino_fileno, lino_getch, lineno_getl, lineno_gets, lineno_open): DS. (parser_set_lineno, lisp_parse_impl): PS. * parser.l (YY_INPUT): PS. * rand.c (make_random_state): PS. * regex.c (print_rec): DS. (search_regex): PS. * signal.c (kill_wrap, raise_wrap, get_sig_handler, getitimer_wrap, setitimer_wrap): DS. * socket.c (addrinfo_in, sockaddr_pack, fd_timeout, to_connect, open_sockfd, sock_mark_connected, sock_timeout): TS. (getaddrinfo_wrap, dgram_set_sock_peer, sock_bind, sock_connect, sock_listen, sock_accept, sock_shutdown, sock_send_timeout, sock_recv_timeout, socketpair_wrap): DS. * stream.c (generic_fill_buf, errno_to_string, stdio_truncate, string_out_put_string, open_fileno, open_command, base_name, dir-name): DS. (unget_byte, put_buf, fill_buf, fill_buf_adjust, get_line_as_buf, formatv, put_byte, test_set_indent_mode, test_neq_set_indent_mode, set_indent_mode, set_indent, inc_indent, set_max_length, set_max_depth, open_subprocess, run ): PS. (fds_subst, fds_swizzle): TS. * struct.c (make_struct_type, super, umethod_args_fun): PS. (method_args_fun): DS. * strudel.c (strudel_put_buf, strudel_fill_buf): DS. * sysif.c (errno_wrap, exit_wrap, usleep_wrap, mkdir_wrap, ensure_dir, makedev_wrap, minor_wrap, major_wrap, mknod_wrap, mkfifo_wrap, wait_wrap, wifexited, wexitstatus, wifsignaled, wtermsig, wcoredump, wifstopped, wstopsig, wifcontinued, dup_wrap, close_wrap, exit_star_wrap, umask_wrap, setuid_wrap, seteuid_wrap, setgid_wrap, setegid_wrap, simulate_setuid_setgid, getpwuid_wrap, fnmatch_wrap, dlopen_wrap): DS. (chmod_wrap, do_chown, flock_pack, do_utimes, poll_wrap, setgroups_wrap, setresuid_wrap, setresgid_wrap, getgrgid_wrap): PS. (c_time): TS. * sysif.h (c_time): Declaration updated. * syslog.c (openlog_wrap, syslog_wrap): DS. * termios.c (termios_pack): TS. (tcgetattr_wrap, tcsetattr_wrap, tcsendbreak_wrap, tcdrain_wrap, tcflush_wrap, tcflow_rap, encode_speeds, decode_speeds): DS. * txr.c (compato, array_dim, gc_delta): DS. * unwind.c (uw_find_frames_by_mask): DS. * vm.c (vm_make_desc): PS. (vm_make_closure, vm_swtch): DS.
* regex: duplicate character range crash fix.Kaz Kylheku2020-04-171-6/+12
| | | | | | | | | | * regex.c (L1_fill_range, L2_fill_range, L3_fill_range): Bug: when the arguments ch0 and ch1 indicate that a block is to be filled entirely, we assume that the pointer is either null or else a pointer to an allocated block, either of which we can free and replace with a -1 pointer to indicate a full block. However the pointer may already be -1, in which case we wrongly pass that to free(). We must check for it.
* unicode: wide character upkeep 3.Kaz Kylheku2020-04-171-0/+7
| | | | | | * regex.c (create_wide_cs): Add some Emoji ranges from Plane 1, loosely following the Unicode 13.0 data given in https://en.wikipedia.org/wiki/Emoji.
* unicode: wide character upkeep 2.Kaz Kylheku2020-04-171-2/+2
| | | | | | * regex.c (create_wide_cs): Extend over the entire supplementary ideographic plane U+2XXXX and the tertiary one: U+3XXXX.
* unicode: character width upkeep.Kaz Kylheku2020-04-171-5/+4
| | | | | | | | | | | | | | | | | Updating the regex for matching code points corresponding to wide and full width characters, with regard to the old 1998 document: http://www.unicode.org/reports/tr11-2/ More to follow. I neglected to comment where the original data came from, and neglected to comment it. In some cases it has more coverage than the 1998 document; in some cases less. * regex.c (create_wide_cs): Extending the 1100-115F range to 11F9 to cover all of Korean Hangeul. Replace two occurrences of 3000-303E with one 3000-303F. Merge 3250-32FE with 3300-4DB5, and extend to 4DBF. Add private use range E000-E757.
* internals: rename misnamed curry_* functions.Kaz Kyheku2020-03-171-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The various curry_xx_yy functions perform partial application, not currying. The curry prefix is being renamed to pa (partially apply). * lib.c (remq_lazy, remql_lazy, remqual_lazy, tree_Find): Updated. (do_curry_12_1, do_curry_12_1_v, do_curry_12_2, do_curry_123_1, do_curry_123_23, do_curry_123_2, do_curry_123_3, do_curry_1234_1, do_curry_1234_34): Renamed to do_pa_12_1, do_pa_12_1_v, do_pa_12_2, do_pa_123_1, do_pa_123_23, do_pa_123_2, do_pa_123_3, do_pa_1234_1, do_pa_1234_34. (curry_12_1, curry_12_1_v, curry_12_2, curry_123_1, curry_123_23, curry_123_2, curry_123_3, curry_1234_1, curry_1234_34): Renamed to pa_12_1, pa_12_1_v, pa_12_2, pa_123_1, pa_123_23, pa_123_2, pa_123_3, pa_1234_1, pa_1234_34. (transposev, do_juxt): Updated. * lib.h: Declarations renamed. * eval.c (subst_vars, qquote_init, expand_catch, weavev): Updated. * filter.c (get_filter, build_filter_from_list, filter_string_tree, filter_init): Updated. * match.c (tx_subst_vars, do_txeval, v_freeform, v_bind, v_throw, v_deffilter, v_assert, h_assert): Updated. * parser.y (gather_clause): Updated. * regex.c (regex_range_full_fun, regex_range_left_fun, regex_range_right_fun, regex_range_search_fun): Updated. * stream.c (open_files, open_files_star): Updated. * txr.c (txr_main): Updated. * unwind.c (me_defex): Updated.
* Copyright year bump 2020.Kaz Kylheku2019-12-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * LICENSE, LICENSE-CYG, METALICENSE, Makefile, alloca.h, args.c, args.h, arith.c, arith.h, buf.c, buf.h, cadr.c, cadr.h, chksum.c, chksum.h, chksums/crc32.c, chksums/crc32.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lib.c, lib.h, linenoise/linenoise.c, linenoise/linenoise.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, protsym.c, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/asm.tl, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/compiler.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/debugger.tl, share/txr/stdlib/defset.tl, share/txr/stdlib/doloop.tl, share/txr/stdlib/error.tl, share/txr/stdlib/except.tl, share/txr/stdlib/ffi.tl, share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/keyparams.tl, share/txr/stdlib/op.tl, share/txr/stdlib/package.tl, share/txr/stdlib/param.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/pmac.tl, share/txr/stdlib/save-exe.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/stream-wrap.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/trace.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/vm-param.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, tree.c, tree.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h, win/cleansvg.txr: Extended copyright notices to 2020.
* safety: fix type tests that code can subvert.Kaz Kylheku2019-09-301-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes numerous instances of a safety hole which involves the type of a COBJ object being tested to be of a given class using logic that can be subverted by the definition of a like-named struct. Specifically logic like (typeof(obj) == hash_s) is broken, because if a struct type called hash is defined, then the test will yield true for instances of that struct type. Those instances can then be passed into code that only works on COBJ hashes, and relies on this test to reject invalid objects. * ffi.c (make_carray): Replace fragile test with strong one, using new cobjclassp function. * hash.c (hashp): Likewise. * lib.c (class_check): The expression used here for the type test moves into the new function cobjclassp and so is replaced by a call to that function. (cobjclassp): New function. * lib.h (cobjclassp): Declared. * rand.c (random_state_p): Replace fragile test using cobjclassp. * regex.c (char_set_compile): Replace fragile typeof tests for character type with is_chr. (reg_derivative, regexp): Replace fragile test with cobjclassp. * struct.c (struct_type_p): Replace fragile test with cobjclassp.
* Replace lt(x, zero) pattern.Kaz Kylheku2019-06-151-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | This slight inefficiency occurs in some 37 places in the code. In most places we replace lt(x, zero) with minusp(x). In a few places, !plusp(x) is used and surrounding logic is simplified. In one case, the silly pattern lt(x, zero) ? t : nil is replaced with just minusp(x). * buf.c (sub_buf, replace_buf): Replace lt. * combi.c (perm, rperm, comb, rcomb): Likewise. * eval.c (do_format_field): Likewise. * lib.c (listref, sub_list, replace_list, split_func, split_star_func, match_str, lazy_sub-str, sub_str, replace_str, sub_vec, replace_vec): Likewise. * match.c (weird_merge): Likewise. * regex.c (match_regex, match_regex_right_old, match_regex_right, regex_prefix_match, regex_range_left, regex_range_right): Likewise.
* scan-until-match, count-until-match: new functions.Kaz Kylheku2019-02-161-7/+35
| | | | | | | | | | | | | * regex.c (scan_until_common): New static function, made from read_until_match. (read_until_match): Now just wrapper for scan_until_common. (scan_until_match, count_until_match): New functions. (regex_init): Registered new intrinsics scan-until-match and count-until-match. * regex.h (read_until_match, scan_until_match): Declared. * txr.1: Documented.
* Copyright year bump 2019.Kaz Kylheku2019-01-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * LICENSE, LICENSE-CYG, METALICENSE, Makefile, args.c, args.h, arith.c, arith.h, buf.c, buf.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, protsym.c, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/asm.tl, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/compiler.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/doloop.tl, share/txr/stdlib/error.tl, share/txr/stdlib/except.tl, share/txr/stdlib/ffi.tl, share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/keyparams.tl, share/txr/stdlib/op.tl, share/txr/stdlib/package.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/pmac.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/stream-wrap.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/trace.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/vm-param.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h, win/cleansvg.txr: Extended Copyright line to 2018.
* Eliminate ALLOCA_H.Kaz Kylheku2018-12-311-1/+1
| | | | | | | | | | | | | * configure: Instead of generating a definition of ALLOCA_H, generate the variable HAVE_ALLOCA_<name> with a value of 1, where <name> is one of stdlib, alloca or malloc. * alloca.h: New header. * args.c, eval.c, ffi.c ffi.c, ftw.c, hash.c, lib.c, match.c, parser.c, parser.y, regex.c, socket.c, stream.c, struct.c, sysif.c, syslog.c, termios.c, unwind.c, vm.c: Include "alloca.h" instead of ALLOCA_H.
* Drastically reduce inclusion of <dirent.h>.Kaz Kylheku2018-12-111-1/+0
| | | | | | | | | | | | | | | | | | | The <dirent.h> header is included all over the place because it is needed by a single declaration in stream.h. That declaration is for a function that is only called within stream.c, so we make it internal. Now only stream.c has to include <dirent.h>. * buf.c, debug.c, eval.c, ffi.c, filter.c, gc.c, gencadr.txr, hash.c, lib.c, lisplib.c, match.c, parser.c, regex.c, socket.c, struct.c, strudel.c, sysif.c, syslog.c, termios.c, txr.c, unwind.c, vm.c: Remove #include <dirent.h>. * cadr.c: Regenerated. * stream.c (make_dir_stream): Make external function static. * stream.h (make_dir_stream): Declaration updated.
* regex: relocate unlikely case after other tests.Kaz Kylheku2018-11-121-3/+3
| | | | | | * regex.c (reg_derivative): When classifying the regex's operator, don't check for vanishingly unlikely internal error cases first; that should be done at the end.
* Better identify functions that misuse COBJ-s and hashes.Kaz Kylheku2018-11-071-10/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In this patch, the cobj_handle, cobj_ops and variants of gethash get an additional argument to identify the caller. Many functions are updated to pass this down. * buf.c (buf_strm): Pass self name to cobj_handle. * eval.c (env_fbind, env_vbind, rt_defvarl, me_case): Pass self name to gethash_c or gethash_e. (load): Pass self name to read_eval_stream and read_compiled_file. (reg_symacro): Pass situation-identifying string to gethash_c. * ffi.c (ffi_type_struct_checked, ffi_closure_struct_checked, ffi_call_desc_checked, uni_struct_checked): Take self name parameter, and pass down to cobj_handle. (ffi_get_type, ffi_get_lisp_type): Take self name and pass down to ffi_type_struct_checked. (union_get_ptr): Take self name and pass to uni_struct_checked. (ffi_union_in, ffi_union_put): Pass self name to union_get_ptr. (ffi_type_compile): Pass self name to ffi_get_lisp_type. (ffi_make_call_desc): Pass self name to ffi_type_struct_checked, ffi_get_type and ffi_call_desc_checked. (ffi_make_closure): Pass self name to ffi_call_desc_checked. (ffi_closure_get_fptr): Take self name, pass to ffi_closure_struct_checked. (ffi_typedef, ffi_size, ffi_alignof, ffi_offsetof, ffi_arraysize, ffi_elemsize, ffi_elemtype, ffi_put_into, ffi_put, ffi_in, ffi_get, ffi_out, make_carray): Pass self name to ffi_closure_struct_checked. (carray_struct_checked): Take self name, pass to cobj_handle. (carray_set_length, carray_dup, carray_own, carray_free, carray_type, length_carray, copy_carray, carray_ptr, buf_carray, vec_carray, list_carray, carray_ref, carray_refset, carray_sub, carray_replace, carray_get_common, carray_put_common, unum_carray, num_carray, put_carray, fill_carray): Pass self name to carray_struct_checked. (carray_blank, carray_buf, carray_cptr): Pass self name ffi_type_struct_checked. (carray_pun): Pass self name to carray_struct_checked and ffi_type_struct_checked. (make_union): Pass self name to ffi_type_struct_checked. (union_members, union_get, union_put, union_in, union_out): Pass self name to uni_struct_checked. (make_zstruct, zero_fill, put_obj, get_obj, fill_obj): Pass self-name to ffi_type_struct_checked. * ffi.h (ffi_closure_get_fptr, union_get_ptr): Declarations updated. * filter.c (trie_add): Pass self-name to gethash_l. * hash.c (make_similar_hash, copy_hash, hash_count, get_hash_userdata, set_hash_userdata, hash_begin, hash_next, hash_uni, hash_diff, hash_isec): Pass self name to cobj_handle. (gethash_c, gethash_e): Take self name parameter and pass down to cobj_handle. (gethash_f): Take self parameter and pass down to gethash_e. (gethash, inhash, gethash_n, sethash, pushhash, remhash, clearhash, hash_update_1): Pass self name to gethash_e or gethash_c. * hash.h (gethash_c, gethash_e, gethash_f): Declarations updated. (gethash_l): Take self name, and pass down to gethash_c. * lib.c (class_check): Take self name parameter and use in type mismatch diagnostic. (use_sym, unuse_sym, symbol_needs_prefix, find_symbol, intern, unintern, intern_fallback, unique, in, sel, obj_print_impl, populate_obj_hash, obj_hash_merge): Pass self name to gethash_f or gethash_l. (symbol_visible, obj_init): Pass situation-identifying string to gethash_e. (cobj_handle, cobj_ops): Take self name parameter and pass down to class_check. * lib.h (class_check, cobj_handle, cobj_ops): Declarations updated. * match.c (v_load): Pass self name to read_compiled_file and read_eval_stream. * parser.c (get_parser_impl): Take self name and pass to cobj_handle. (ensure_parser): Pass situation-identifying string to gethash_c. (parser_circ_def): Pass self-name to gethash_c. (lisp_parser_impl): Pass self name to get_parser_impl and class_check. (lisp_parse, nread, iread): Pass self-name to lisp_parser_impl. (read_file_common): Take self name parameter and pass down to get_parser_impl. (read_eval_stream, read_compiled_file): Take self name and pass down to read_file_common. (load_rcfile): Pass situation-identifying string to read_eval_streem. (get_visible_syms): Pass situation-identifying string to gethash_c. (parser_errors, parser_eof): Pass self name to cobj_handle. * parser.h (read_eval_stream, read_compiled_file): Declarations updated. * parser.y (rlset): Pass self name to gethash_c. * rand.c (make_random_state, random_state_get_vec,l random_fixnum, random_float): Pass self name to cobj_handle. * regex.c (regex_source, regex_print, regex_run): Pass self-name to cobj_handle. (regex_machine_init): Take self name param and pass to cobj_handle. (search_regex, match_regex, match_regex_right, regex_prefix_match, read_until_match): Pass self-name to regex_machine_init. * stream.c (stdio_get_fd): Pass self name to cobj_handle. (generic_get_line): Get COBJ operations via unsafe, diret object access rather than cobj_ops. (set_mode_props): Get object handle via unsafe, direct object access. (stream_fd, sock_family, sock_type, sock_peer, set_sock_peer, get_string_from_stream, get_list_from_stream, stream_set_prop, stream_get_prop, close_stream, get_error, get_error_str, clear_error, get_line, get_char, get_byte, unget_char, unget_byte, put_buf, fill_buf, put_string, put_char, put_byte, flush_stream, seek_stream, truncate_stream, get_indent_mode, test_set_indent_mode, set_indent_mode, get_indent, set_indent, inc_indent, width_check, force_break, get_set_ctx, get_ctx): Pass self name to cobj_ops. (make_delegate_stream): Take self name parameter, pass down to cobj_ops. (record_adapter): Pass self name down to make_delegate_stream. (format): Pass self name to class_check. * struct.c (stype_handle): Pass self name to cobj_handle. (make_struct_type): Pass self name to class_check. * txr.c (read_eval_stream_noerr): Take self name parameter, pass to read_eval_stream. (txr_main): Pass istuation-identifying string to read_compiled_file and read_eval_stream_noerr. * unwind.c (revive_cont): Pass self-name to cobj_handle. * vm.c (vm_desc_struct): Take self name parameter, pass to cobj_handle. (vm_desc_nlevels, vm_desc_nregs, vm_desc_bytecode, vm_desc_datavec, vm_desc_symvec, vm_execute_toplevel, vm_execute_closure, vm_closure_entry): Pass self name to vm_desc_struct. (vm_closure_struct): Take self name parameter, pass to cobj_handle.
* regex: fix double free corruption bug.Kaz Kylheku2018-04-041-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unfortunately, the nfa_state_free function doesn't check the static flag on a character set and just calls chr_set_destroy. So when one of the static character sets is planted into the NFA graph, when that graph is garbage-collected, it blows away the static character set. Then when that happens twice for the same set, boom! We make an alteration to make the destruction more defensive. Callers of char_set_destroy are no longer saddled with the responsibility of honoring the static flag buried in the object. Instead, that function itself check the static flag. An argument is provided to force the deletion in spite of the static flag; that is needed for the global cleanup of the static states. (Only occurs if txr is run with --free-all and cleanly exited.) * regex.c (char_set_destroy): Take extra argument, force. If the set is marked static, then do nothing, unless force is nonzero. (char_set_cobj_destroy): Don't check the static flag, just call char_set_destroy, force zero. (nfa_state_free): Add force zero argument to char_set_destroy call. The double free bug is thereby fixed here; static sets are protected. (regex_free_all): Force all the char_set_destroy calls here.
* regex: read/print bug: escaped double quote.Kaz Kylheku2018-04-041-2/+2
| | | | | | | | | | | | | | | | | | | Because the regex printer wrongly uses out_str_char (for the sake of borrowing its semicolon-notation processing) when a regex prints, all characters that require escaping in a string literal get escaped, which includes the " character. Unfortunately the \" sequence which results is rejected by the regex parser. * lib.c (out_str_char): Kludge: add extra argument to distinguish regex use versus string use, and treat the double quote accordingly. (out_str_readable): Give 0 arg to new param of out_str_char. * lib.h (out_str_char): Declaration updated. * regex.c (print_class_char, print_rec): Pass 1 to new param of out_str_char.
* Copyright year bump 2018.Kaz Kylheku2018-02-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * LICENSE, LICENSE-CYG, METALICENSE, Makefile, args.c, args.h, arith.c, arith.h, buf.c, buf.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, protsym.c, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/doloop.tl, share/txr/stdlib/error.tl, share/txr/stdlib/except.tl, share/txr/stdlib/ffi.tl, share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/keyparams.tl, share/txr/stdlib/op.tl, share/txr/stdlib/package.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/pmac.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/stream-wrap.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, win/cleansvg.txr: Extended Copyright line to 2018.
* regex: bugfix: squash duplicates in move set.Kaz Kylheku2017-09-131-2/+1
| | | | | | | | * regex.c (nfa_move_closure): The move set calculation is wrongly assuming that all of the states are new and not testing their visited color. This could result in the same state being added twice. Though harmless, it wastefully inflates the set size.
* regex: factor out repeated visit-coloring pattern.Kaz Kylheku2017-09-131-13/+15
| | | | | | | * regex.c (nfa_test_set_visited): New inline function. (nfa_map_states, nfa_thread_epsilons, nfa_closure, nfa_move_closure): Use function instead of coding pattern which tests the state and sets the visited member.
* regex: re-introduce nfa_accept states.Kaz Kylheku2017-09-131-13/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The nfa_accept state label is re-introduced. This state type has the same representation as nfa_empty; essentially, this replaces the flag. This makes the state type smaller, and we don't have to access the flag to tell if a state is an acceptance state. * regex.c (nfa_kind_t): New enum label, nfa_accept. (struct nfa_state_empty): Member accept removed. (nfa_accept_state_p): Macro tests only for nfa_accept type. (nfa_empty_state_p): New macro. (nfa_state_accept): Set type of new state to nfa_accept; do not set accept flag. (nfa_state_empty): Do not set accept flag. (nfa_state_empty_convert): Do not clear accept flag. (nfa_map_states): Handle nfa_accept in switch, in the same case as nfa_empty. (nfa_thread_epsilons): Don't test for accept state in nfa_empty case; it would be always false now. Add nfa_accept case to switch which only arranges for a traversal of the two transitions. (Though these are expected to be null at the stage of the graph when this function is applied). (nfa_fold_accept): Switch type to nfa_accept rather than setting accept flag. (nfa_closure, nfa_move_closure): Use new macro for testing whether a state is empty.
* regex: retain unoptimized form for printing.Kaz Kylheku2017-09-121-5/+1
| | | | | | | regex.c (regex_compile): Take the source code to be the original code, rather than the version with AST-level optimizations and expansions related to the nongreedy operator.
* regex: bug printing #/abc(def|ghi)/Kaz Kylheku2017-09-121-1/+1
| | | | | | | | | | | | This was broken by the July 16 commit "regex: don't print superfluous parens around classes", 2411f779f47c441659720ad0ddcabf91df1d2529. * regex.c (print_rec): If an (or ...) appears as a compound element, it must be rendered in parentheses; or_s must be handled here just like and_s. Prior to the faulty commit, this was implicitly true because the logic was inverted and wasn't ruling out or_s.
* regex: accept-folding optimization.Kaz Kylheku2017-09-121-0/+23
| | | | | | | | | | | | In this optimization, we identify places in the NFA graph where empty states transition to accept states. We eliminate these transitions and turn the empty states into accept states. Accept states which thereby become unreachable from the start state are pruned away. * regex.c (nfa_fold_accept): New static function. (nfa_optimize): Add a pass which applies nfa_fold_accept to all states.
* regex: eliminate nfa_accept state type.Kaz Kylheku2017-09-121-20/+26
| | | | | | | | | | | | | | | | | | | | | | | | | Acceptance states are instead represented by adding an accept flag to nfa_empty states. This will support an optimization which eliminates accept states that are referenced only by empty states. * regex.c (nfa_kind_t): Enum member nfa_accept removed. (struct nfa_state_accept): Renamed to nfa_state_any. (struct nfa_state_empty): New member, accept. (union nfa_state): Member a changes from struct nfa_state_accept to struct nfa_state_any. (nfa_accept_state_p): New macro. (nfa_state_accept): Now makes an nfe_empty type state, with no transitions out and the accept flag set. (nfa_state_empty): Initialize accept flag to zero. (nfa_state_empty_convert): Set the accept flag to zero. (nfa_state_merge): Use new macro in assertion. (nfa_map_states): Remove nfa_accept switch case. (nfa_thread_epsilons): We must not eliminate an empty state which is an acceptance state, even if it meets the other conditions. (nfa_closure): Use a local variable to eliminate repetition of set[i] expression. Test for accept state using new macro. (nfa_move_closure): Test for accept state using new macro.
* regex: epsilon-threading optimization on NFA graph.Kaz Kylheku2017-09-121-1/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is something done for the sake of faster NFA simulation. I think it's glossed over in regex literature because it makes no difference in a NFA to DFA transformation. Fewer states in the graph means less work in the nfa_move_closure function at regex run time. In the NFA graph, there can occur empty states which are useless. These are states which hold only one epsilon transition. Those states can be replaced by whatever state their epsilon transition points to. The terminology is inspired by "jump threading" in code optimization, whereby if a branch takes place to an instruction which holds an unconditional branch, the original branch can be retargetted to go to the ultimate target. I.e. we turn: x e S0 ---> S1 ----> S2 where e denotes an epsilon transition, and x represents any kind of transition to: x S0 ---> S2 We can also eliminate empty states which have no transition; those can be replaced by a null pointer. I suspect these do not actually occur. * regex.c (nfa_thread_epsilons, nfa_noop, nfa_optimize): New static functions. (regex_compile): Invoke nfa_optimize on the results of nfa_compile_regex.
* regex: elide needless increments of visited counter.Kaz Kylheku2017-09-121-5/+5
| | | | | | | | | | | | | * regex.c (nfa_run): Start with the existing value of the visited counter and pre-increment it when calling nfa_closure and nfa_move_closure. Thus it is incremented only as many times as those functions are called: one fewer than before. (regex_machine_reset): regm->n.visited is already incremented, since 1 was added to the prior value; there is no need to increment it again. (nfa_handle_wraparound): More prudent approach here. If we get within 8 increments of wraparound, we fast forward to UINT_MAX and then 0.
* regex: bugfix: incorrect use of nfa_move_closure.Kaz Kylheku2017-09-121-1/+1
| | | | | | | | | | | | | | This goes back to November 2, 2009 commit, "Start of implementation for freestyle matching", 6191fbb2ca7a9ac339dd3994bdea8273ceb0a24d It is exposed if we perform an epsilon threading optimization on the NFA graph. * regex.c (regex_machine_feed): We must pass the new, incremented value of the visited counter, to nfa_move_closure. States have already been tagged with the old value before the call to regex_machine_feed, so we risk failing to visit some states and include them in the closure.
* regex: new function, regex-prefix-match.Kaz Kylheku2017-09-111-0/+48
| | | | | | | | | | | | | | | | | This new function allows a program to determine whether a given string is the prefix of any of the strings denoted by a regular expression; or, in alternative words, whether a given string is the prefix of a possibly longer string which matches a regular expression. * regex.c (regex_machine_infer_init_state): New static function. (regex_prefix_match): New function. (regex_init): regex-prefix-match intrinsic registered. regex.h (regex_prefix_match): Declared. * txr.1: Documented.
* regex: remove nfa_reject representation.Kaz Kylheku2017-09-111-40/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reject states are no longer explicitly represented in the NFA graph. Any transition that previously would have led to a reject state is just a null pointer now: no transition. This just means we have to check one place for a null pointer. This change fixes a flaw in nfa_closure: the function was propagating nfa_reject states to the closure set. When the input set consisted only of reject states (or rather exactly one reject state in that special case), it would add it to the output set and return 1, making it seem as if the regex is one that can potentially match something. * regex.c (nfa_kind_t): Enum member nfa_reject removed. (nfa_state_reject): Static function removed. (nfa_compile_regex): Compile a t regex (match nothing) to a NFA graph with no transition at all into any start state; effectively an empty graph with no state nodes. (nfa_map_states): Remove nfa_reject case from switch. Also, stylistic change; state pointers are tested as Booleans elsewhere rather than by comparison to null pointer. (nfa_count_states, nfa_handle_wraparound): Handle null state pointer. (nfa_move_closure): Test the mov transition for null; anything can potentially now have a null transition, not only epsilon nodes. (nfa_run, regex_machine_reset): Handle nfa.start being null indicating an empty NFA graph. In that case we don't have to calculate the closure; we just have an empty set of states and set nfa.nclos to zero. The visited disambiguation counter is irrelevant since there are no states to visit. (regex_machine_feed): Don't try to propagate visited counter stored in regex machine to empty NFA graph which has no states.
* regex: don't print superfluous parens around classes.Kaz Kylheku2017-07-161-2/+2
| | | | | | | | * regex.c (print_rec): Switching from negative tests to positive: print parens around elements of a compound which have a lower precedence. The previously incorrectly omitted set and cset remain unmentioned, so under this inversion of logic, they print without parentheses.
* cobj: rename poorly named default operation.Kaz Kylheku2017-05-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Renaming cobj_hash_op to cobj_eq_hash_op. This function is only appropriate to use with COBJ objects which use eq as their equal funtion. I've spotted one instance of an inappropriate use which have to be addressed by a different commit: the equal function is other than eq, but cobj_hash_op is used for the equal hash. * lib.h (cobj_hash_op): Declaration renamed to cobj_eq_hash_op. * hash.c (cobj_hash_op): Renamed to cobj_eq_hash_op. (hash_iter_ops): Refer to renamed cobj_hash_eq_op. * ffi.c (ffi_type_builtin_ops, ffi_type_struct_ops, ffi_type_ptr_ops, ffi-closure_ops, ffi_call_desc_ops): Likewise. * lib.c (cptr_ops): Likewise. * parser.c (parser_ops): Likewise. * rand.c (random_state_ops): Likewise. * regex.c (char_set_ops, regex_obj_ops): Likewise. * socket.c (dgram_strm_ops): Likewise. * stream.c (null_ops, stdio_ops, tail_ops, pipe_ops, dir_ops, string_in_ops, byte_in_ops, strlist_in_ops, string_out_ops, strlist_out_ops, cat_stream_ops, record_adapter_ops): Likewise. * struct.c (struct_type_ops): Likewise. * sysif.c (cptr_dl_ops): Likewise. * syslog.c (syslog_strm_ops): Likewise. * unwind.c (cont_ops): Likewise.
* Rename badly named default_bool_argKaz Kylheku2017-03-171-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * lib.h (default_bool_arg): Inline function renamed to default_null_arg. * eval.c (if_fun, pad, ginterate, giterate, range_star, range, constantp, macroexpand_1, macro_form_p, expand_with_free_refs, do_expand, eval_intrinsic, func_get_name, make_env_intrinsic): Follow rename. * arith.c (lognot): Likewise. * gc.c (gc_finalize): Likewise. * glob.c (glob_wrap): Likewise. * hash.c (group_reduce, gethash_n): Likewise. * lib.c (print, multi_sort, lazy_str, vector, iff, tok_str, split_str_keep, search_str, remove_if, val): Likewise. * match.c (match_fun): Likewise. * parser.c (lisp_parse_impl, regex_parse): Likewise. * rand.c (make_random_state): Likewise. * regex.c (read_until_match, search_regex, regex_compile): Likewise. * socket.c (sock_accept, sock_connect): Likewise. * stream.c (open_files_star, open_files, run, open_process, open_tail, get_string, record_adapter): Likewise. * struct.c (static_slot_ensure, static_slot_ens_rec, clear_struct, make_struct_type): Likewise. * sysif.c (exec_wrap, errno_wrap, cobj_ops_init): Likewise. * unwind.c (uw_capture_cont, uw_find_frames_impl): Likewise.
* Fix missing nao terminator in formatted printing.Kaz Kylheku2017-03-131-1/+1
| | | | | | | | | | | | | | * arith.c (trunc1, trunc, floorf, ceili): Add missing nao terminator to uw_throwf calls. * debug.c (debug): Missing nao terminator in format call. * eval.c (expand_opt_params_rec, me_equot): Missing nao terminator in eval_error call. * lib.c (use_sym): Missing nao in uw_throw call. * regex.c (reg_derivative): Missing nao in uw_throwf.
* Bump copyright year to 2017.Kaz Kylheku2017-01-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | * LICENSE, LICENSE-CYG, METALICENSE, Makefile, args.c, args.h, arith.c, arith.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, stream.c, stream.h, struct.c, struct.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/except.tl, share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/package.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl: Add 2017 to all copyright headers and strings.
* Fix inconsistency in regex-source.Kaz Kylheku2016-12-251-2/+8
| | | | | | | | | | | | | | | | If we compile the regex expression (compound "str*"), calling regex-source on the compiled regex object yields "str*". That, of course, is treated as regex character syntax if fed back to regex-compile, and the * becomes an operator. We want the source to be (compound "str*"). This happens because the AST optimizer reduces (compound X) -> X. * regex.c (regex_compile): If the optimized expression is just a character string atom S, then for the purposes of maintaining the source code, convert it to (compound S).
* Adding functions fr^$, fr^, fr$ and frr.Kaz Kylheku2016-12-011-0/+28
| | | | | | | | * regex.c (regex_range_full_fun, regex_range_left_fun, regex_range_right_fun, regex_range_search_fun): New functions. (regex_init): Register fr^$, fr^, fr$ and frr intrinsics. * txr.1: Documented.
* Add stream printing context.Kaz Kylheku2016-10-201-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is some infrastructure which will support *print-circle*. * lib.h (struct strm_ctx): Forward declared. (struct cobj_ops): Add context parameter to print function pointer. (cobj_print_op, obj_print_impl): Add context parameter to declarations. * hash.c (hash_print_op): Take context argument and pass it down in obj_print_impl calls. * lib.c (cobj_print_op, out_quasi_str): Likewise (obj_print_impl): Likewise, and also pass to COBJ print method. (obj_print, obj_pprint): Pass null pointer as context argument to obj_print_impl. * regex.c (regex_print): Take context parameter and ignore it. * socket.c (dgram_print): Likewise. * stream.h (struct strm_ctx): New struct type. (struct strm_base): New ctx member, pointer to struct strm_ctx. (stream_print_op): Add context parameter to declaration. (get_set_ctx, get_ctx): Declared. * stream.c (strm_base_init): Add null pointer to initializer. (strm_base_cleanup): Add assertion against context pointer being non-null: that indicates that some stream operation installed a context pointer and neglected to restore it to null before returning, which is bad because context will be stack allocated. (stream_print_op, stdio_stream_print, cat_stream_print): Take context parameter and ignore it. (get_set_ctx, get_ctx): New functions. * struct.c (struct_type_print): Take context parameter and ignore it. (struct_inst_print): Take context parameter and pass down to obj_print_impl.
* Support n-ary and and or operators in regex.Kaz Kylheku2016-10-101-1/+63
| | | | | | | | | | | Since much regex code assumes these are binary, the easiest and briefest approach is to implement a code transformation pass which rewrites n-ary forms into binary. * regex.c (reg_nary_unfold, reg_nary_to_bin): New functions. (regex_compile): Put raw sexp through reg_nary_to_bin to expand the nary syntax.
* Simplify some regex tree walking code.Kaz Kylheku2016-10-101-18/+10
| | | | | | | | | | * regex.c (reg_expand_nongreedy, reg_compile_csets): Generalize the compound_s case slightly by referring to sym rather than hard-coded compound_s. Then handle most of the regex operators under this same case. Their semantics are not relevant to the expansions being performed in these functions: all their arguments are regexes to be recursed over.
* New function rra.Kaz Kylheku2016-10-031-0/+50
| | | | | | | | | * regex.c (range_regex_all, regex_range_all): New functions. (regex_init): Register rra intrinsic function. * regex.c (range_regex_all, regex_range_all): Declared. * txr.1: Documented rra.
* New rr function.Kaz Kylheku2016-10-031-0/+12
| | | | | | | | | | * regex.c (regex_range_search): New function. (regex_init): Register regex_range_search as rr intrinsic. * regex.h (regex_range_search): Declared. * txr.1: Documented rr, and added reference to it in description of regex-range.
* search-regex improvement: negative start and more.Kaz Kylheku2016-10-031-40/+52
| | | | | | | | | | | | * regex.c (search_regex): Handle negative starting positions according to the convention elsewhere and fail excessively negative ones. Consistently fail on starting positions exceeding the length of the string. Handle zero length matches by reporting them against the start position or position one past the last character, based on the value of from-end. * txr.1: search-regex documentation updated.
* Synchronize license comments with LICENSE.Kaz Kylheku2016-10-011-16/+17
| | | | | | | | | | | | | | | | | | | | * Makefile, args.c, args.h, arith.c, arith.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/except.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Revert to verbatim 2-Clause BSD.
* Flurry of regex bugfixes.Kaz Kylheku2016-09-251-11/+27
| | | | | | | | | | | | | | | | | | | | | | | | | * regex.c (match_regex): Bail if pos is too positive, beyond length of string. (match_regex_right): Include the pos == end case in the iteration, so we can match an empty suffix of the string. The inner loop guard takes care of not feeding any characters from the string into the regex machine in this case; we just feed the terminating zero to get the final state. (match_regst): Normalize a negative pos, otherwise the sub_str calculation will be junk, since match_regex returns a normalized position. After normalizing, check that if the position is still negative, the match must fail. (match_regst_right_old, match_regst_right): Use zero rather than t as the range end in sub_str. That way if len is zero and neg(len) produces zero, an empty string will be sliced out. For negative values, the zero serves as one position beyond the last char, just like t. (do_match_full_offs, regex_match_full, regex_range_full, regex_range_left): Fail match if normalized starting pos is negative. (regex_range_right): Fix completely bogus calculation of the returne range in the case when the end position defaults to the string length.
* regex.c: code formatting.Kaz Kylheku2016-09-251-1/+1
| | | | * regex.c (puts_clear_flag): Fix bad indentation.
* New function: regex-source.Kaz Kylheku2016-09-251-0/+7
| | | | | | | | | * regex.c (regex_source): New function. (regex_init): regex-source intrinsic registered. * regex.h (regex_source): Declared. * txr.1: Documented.
* Bugfix in regex printing: & operator.Kaz Kylheku2016-09-251-1/+1
| | | | | * regex.c (print_rec): Fix checking arg1 for consp but accessing arg2.