summaryrefslogtreecommitdiffstats
path: root/utf8.c
Commit message (Collapse)AuthorAgeFilesLines
* First cut at signal handling support.Kaz Kylheku2013-12-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Makefile (OBJS-y): Include signal.o if have_posix_sigs is "y". * configure (have_posix_sigs): New variable, set by detecting POSIX signal stuff. * dep.mk: Regenerated. * arith.c, debug.c, eval.c, filter.c, hash.c, match.c, parser.y, parser.l, rand.c, regex.c, syslog.c, txr.c, utf8.c: Include new signal.h header, now required by unwind, and the <signal.h> system header. * eval.c (exit_wrap): New function. (eval_init): New functions registered as intrinsics: exit_wrap, set_sig_handler, get_sig_handler, sig_check. * gc.c (release): Unused functions removed. * gc.h (release): Declaration removed. * lib.c (init): Call sig_init. * stream.c (set_putc, se_getc, se_fflush): New static functions. (stdio_put_char_callback, stdio_get_char_callback, stdio_put_byte, stdio_flush, stdio_get_byte): Use new functions to enable signals when blocked on I/O. (tail_strategy): Allow signals across sleep. (pipev_close): Allow signals across waitpid. (se_pclose): New static function. (pipe_close): Use new function to enable signals across pclose. * unwind.c (uw_unwind_to_exit_point): use extended_longjmp instead of longjmp. * unwind.h (struct uw_block, struct uw_catch): jb member changes from jmp_buf to extended_jmp_buf. (uw_block_begin, uw_simple_catch_begin, uw_catch_begin): Use extended_setjmp instead of setjmp. * signal.c: New file. * signal.h: New file.
* Bumping copyrights to 2014 and expressing them as year ranges.Kaz Kylheku2013-12-101-1/+1
| | | | Fixing some errors in copyright comments.
* * stream.c (tail_strategy): Execute the strategy code alsoKaz Kylheku2013-12-021-1/+1
| | | | | | | | | | | | | | in the case that the stream's FILE * handle is null. This handles opening the file for the first time. (make_stdio_stream_common): Do not use the FILE * handle if it is null. (open_tail): Do not open the file immediately and error out. This is undesirable because log files might not exist at the time open_tail is called on them. Instead, produce a stream which contains a null file handle, and use tail_strategy to poll for the file to come into existence. * utf8.c (w_freopen): Just in case freopen doesn't like a null pointer for the existing stream, use fopen instead if that is the case.
* * stream.c (struct stdio_handle): New member, mode.Kaz Kylheku2013-11-281-0/+10
| | | | | | | | | | | | | (stdio_stream_mark): Mark the new member during gc. (stdio_seek): When we seek, we should reset the utf8 machine. (tail_strategy): New function. (tail_get_line, tail_get_char, tail_get_byte): Use tail_strategy for polling the file at EOF. (open_tail): Store the mode in the file handle. * utf8.c (w_freopen): New function. * utf8.h (w_freopen): Declared.
* * filter.c, utf8.c: Tabs changed to spaces. For some reason, filter.cKaz Kylheku2013-08-091-61/+61
| | | | used 4 space tabs and utf8.c used 8 space tabs, inconsistently.
* * utf8.c (w_fopen, w_popen): Removing unnecessary casts ofKaz Kylheku2012-05-181-4/+4
| | | | return values of ut8_dup_to.
* * arith.c: Updated copyright year.Kaz Kylheku2012-02-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * arith.h: Likewise. * debug.c: Added copyright header. * debug.h: Updated copyright year. * eval.c: Likewise. * eval.h: Likewise. * filter.c: Likewise. * filter.h: Likewise. * gc.c: Likewise. * gc.h: Likewise. * hash.c: Likewise. * hash.h: Likewise. * lib.c: Likewise. * lib.h: Likewise. * match.c: Likewise. * match.h: Likewise. * parser.h: Likewise. * regex.c: Likewise. * regex.h: Likewise. * stream.c: Likewise. * stream.h: Likewise. * txr.c: Likewise, and e-mail address. * txr.h: Updated copyright year. * unwind.c: Likewise. * unwind.h: Likewise.
* * utf8.c (utf8_from_uc, utf8_decode): Some cascaded if tests convertedKaz Kylheku2012-02-051-16/+34
| | | | | | | to a switch on the upper nybble value. This also fixes an unfortunate bug. The test for the two byte case was written as ch >= 0xc2 && ch <= 0xE0. That should have been ch < 0xE0. Versions of TXR up to 55 have been incorrectly decoding some UTF-8.
* * utf8.c (utf8_from_uc): Bugfix: incorrect condition in characterKaz Kylheku2012-02-041-3/+5
| | | | | | | range check (less than minimum *and* U+DCxx, rather than *or*). Also, we must check for out of range characters. UTF-8 sequences beginning with F4 can code beyond 0x10FFFF. (utf8_decode): Check for characters beyond 0x10FFFF.
* * utf8.c (utf8_from_uc, utf8_decode): Use upper case for hex constants.Kaz Kylheku2012-02-021-25/+29
| | | | | | | If bytes decode to U+DCxx, treat this sequence as invalid. This way we can't be fooled by an attacker into accepting some U+DCxx which on output we will then convert to byte xx. (utf8_to_uc): Use upper case for hex constants.
* * utf8.c (utf8_to_uc, utf8_encode): Do not encode surrogate codeKaz Kylheku2012-02-021-8/+18
| | | | | | | points (U+DC00 to U+DCFF) as multi-byte UTF8 sequences. We use that range for invalid bytes on input, so on output the best thing to do is to reproduce the original bytes. E.g the code U+DCA0 will produce the byte A0.
* * utf8.c (utf8_from_uc, utf8_decode): Impose a minium value on theKaz Kylheku2012-02-021-7/+25
| | | | | | | decoded character based on which UTF-8 case it is from. This rejects overlong forms. * utf8.h (struct utf8_decoder): New member, wch_min.
* Improved support for broken unicode.Kaz Kylheku2011-10-101-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | Regex support for extra-large character sets not compiled in if wchar_t is not wide enough for it. The utf-8 properly throws exceptions when encountering characters that it cannot represent, instead of silently ignoring the situation and continuing with incorrectly computed data. * regex.c (FULL_UNICODE): New macro. (CHAR_SET_L3, CHAR_SET_L2_LO, CHAR_SET_L2_HI): Only defined if full unicde is available. (CHSET_XLARGE, cset_L3_t, struct xlarge_char_set, L2_full, L3_fill_range, L3_contains): Ditto. (unon char_set): Member x1 present only under FULL_UNICODE. (char_set_destroy, char_set_add, char_set_add_range, char_set_contains): CHSET_XLARGE cases only available on FULL_UNICODE. (char_set_compile): Default cst variable to CHSET_LARGE. * utf8.c (FULL_UNICODE): New macro. (conversion_error): New function. (utf8_from_uc): Throw error if not FULL_UNICODE and character is outside the BMP. (utf8_decode): Likewise.
* * LICENSE, Makefile, configure, filter.c, filter.h, gc.c, gc.h, hash.c,Kaz Kylheku2011-10-041-1/+1
| | | | | | hash.h, lib.c, lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated e-mail address.
* * LICENSE, Makefile, configure, gc.c, gc.h, hash.c, hash.h, lib.c,Kaz Kylheku2011-09-231-1/+1
| | | | | | lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated copyright year.
* Bump copyrights to 2010.Kaz Kylheku2010-10-051-1/+1
|
* More void * to mem_t * conversion.Kaz Kylheku2009-12-051-2/+2
|
* utf8.c (utf8_from_uc): Fix bug introduced several commits ago (portingtxr-025Kaz Kylheku2009-11-241-0/+1
| | | | to C++). Caught by regression test suite. Found using git bisect.
* Changes to make the code portable to C++ compilers, whichKaz Kylheku2009-11-241-3/+4
| | | | can be taken advantage of for better diagnostics.
* Improving portability. It is no longer assumed that pointersKaz Kylheku2009-11-231-0/+1
| | | | | | | | can be converted to a type long and vice versa. The configure script tries to detect the appropriate type to use. Also, some run-time checking is performed in the streams module to detect which conversions specifier strings to use for printing numbers.
* Use unsigned char * as allocator return value.Kaz Kylheku2009-11-191-1/+1
|
* Big round of changes to switch the code base to use the streamKaz Kylheku2009-11-161-4/+4
| | | | | | | | | | | | | | | | | abstraction instead of directly using C standard I/O, to eliminate most uses of C formatted I/O, and fix numerous bugs, such variadic argument lists which lack a terminating ``nao'' sentinel. Bug 28033 is addressed by this patch, since streams no longer provide printf-compatible formatting. The native formatter is extended with some additional capabilities to take over. The work on literal objects is expanded and they are now used throughout the code base. Fixed bad realloc in string output stream: reallocating by number of wide chars rather than bytes.
* Version 021 preparation.txr-021Kaz Kylheku2009-11-151-2/+2
| | | | Bumped version numbers, and cleaned up trailing whitespace from some files.
* Provide both char * and unsigned char * interfaces in UTF-8 module.Kaz Kylheku2009-11-141-6/+32
| | | | Fix unsigned and plan char * mixing.
* Fixed broken utf8_from.Kaz Kylheku2009-11-121-21/+123
| | | | Added utf8_encode, utf8_decoder_init, utf8_decode.
* Big conversion to wide characters and UTF-8 support.Kaz Kylheku2009-11-111-0/+168
This is incomplete. There are too many dependencies on wide character support from the C stream I/O library, and implicit use of some encoding which may not be UTF-8. The regex code does not handle wide characters properly. Character type is still int in some places, rather than wchar_t. Test suite passes though.