|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* utf8.c (utf8_from_buffer): Fix incorrect backtracking logic
for handling bad UTF-8 bytes. Firstly, we are not backtracking
to the correct byte. Because src is incremented at the top of
the loop, the backtrack pointer must be set to src - 1 to
point to the possibly bad byte. Secondly, when we backtrack,
we are neglecting to rewinding nbytes! Thus after
backtracking, we will not scan the entire input. Let's avoid
using nbytes, and guard the loop based on whether we hit the
end of the buffer; then we don't have any nbytes state to
backtrack.
* tests/017/ffi-misc.tl: New test case converting a three-byte
UTF-8 encoding of U+DC01: an invalid character in the
surrogate range. We test that the buffer decoder turns this
into three characters, exactly like the stream decoder.
Another test case for invalid bytes following a valid
sequence start.
|
|
The undimensioned (array <type>) and (zarray <type>) types are
not doing UTF-8 conversion when <type> is char or zchar,
or doing what they are supposed to with the FFI character
types, which is inconsistent from their dimensioned
counterparts.
* ffi.c (ffi_varray_dynsize): if the element type is marked
for character conversion, then do the size calculation for
char and zchar by measuring the UTF-8 coded size.
(ffi_varray_alloc): Call ffi_varray_dynsize to get the size,
to benefit from the char handling. Thus when FFI allocates
buffers for a variable length array, it will allocate correct
size required for the UTF-8 encoded string.
(ffi_varray_put, ffi_varray_in): Here we must call
ffi_varray_dynsize and divide by the element type to get the
proper numer of elements. Then we must check for character
conversion and handle the cases.
(ffi_varray_null_term_in): Check for character conversion
cases and route those through ffi_varray_in, which handles
null-terminated strings.
* tests/017/ffi-misc.tl: New file.
* tests/017/ffi-misc.expected: New file.
|