txr - TXR: A data munging language.

	Commit message (Collapse)	Author	Age	Files	Lines
*	utf8: fix backtracking bugs in buffer decoder.	Kaz Kylheku	2021-04-07	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* utf8.c (utf8_from_buffer): Fix incorrect backtracking logic for handling bad UTF-8 bytes. Firstly, we are not backtracking to the correct byte. Because src is incremented at the top of the loop, the backtrack pointer must be set to src - 1 to point to the possibly bad byte. Secondly, when we backtrack, we are neglecting to rewinding nbytes! Thus after backtracking, we will not scan the entire input. Let's avoid using nbytes, and guard the loop based on whether we hit the end of the buffer; then we don't have any nbytes state to backtrack. * tests/017/ffi-misc.tl: New test case converting a three-byte UTF-8 encoding of U+DC01: an invalid character in the surrogate range. We test that the buffer decoder turns this into three characters, exactly like the stream decoder. Another test case for invalid bytes following a valid sequence start.
*	ffi: fix broken char handling in undimensioned arrays.	Kaz Kylheku	2020-01-17	1	-0/+11
	The undimensioned (array <type>) and (zarray <type>) types are not doing UTF-8 conversion when <type> is char or zchar, or doing what they are supposed to with the FFI character types, which is inconsistent from their dimensioned counterparts. * ffi.c (ffi_varray_dynsize): if the element type is marked for character conversion, then do the size calculation for char and zchar by measuring the UTF-8 coded size. (ffi_varray_alloc): Call ffi_varray_dynsize to get the size, to benefit from the char handling. Thus when FFI allocates buffers for a variable length array, it will allocate correct size required for the UTF-8 encoded string. (ffi_varray_put, ffi_varray_in): Here we must call ffi_varray_dynsize and divide by the element type to get the proper numer of elements. Then we must check for character conversion and handle the cases. (ffi_varray_null_term_in): Check for character conversion cases and route those through ffi_varray_in, which handles null-terminated strings. * tests/017/ffi-misc.tl: New file. * tests/017/ffi-misc.expected: New file.