diff options
author | Kaz Kylheku <kaz@kylheku.com> | 2016-03-31 20:53:03 -0700 |
---|---|---|
committer | Kaz Kylheku <kaz@kylheku.com> | 2016-03-31 20:53:03 -0700 |
commit | c27f83bdae5eb00206a478f7764df4fdaa48fc76 (patch) | |
tree | 3fdcd29807e120c1836a7ba59de6098a0460b636 /parser.l | |
parent | 98b26ff13eeb8a9f730801720c4cba30eba9e61d (diff) | |
download | txr-c27f83bdae5eb00206a478f7764df4fdaa48fc76.tar.gz txr-c27f83bdae5eb00206a478f7764df4fdaa48fc76.tar.bz2 txr-c27f83bdae5eb00206a478f7764df4fdaa48fc76.zip |
UTF-8 API overhaul: security, and other concerns.
The main aim here is to pave the way for conversion between
arbitrary buffers of bytes (that may include embedded NUL
characters) and a wide string.
Also, a potential security hole is closed. When we convert a
TXR string to UTF-8 for use with some C library API, any
embedded pnul characters (U+DC00) turn into NUL
bytes which effectively cut the UTF-8 string short, and
silently so. The C library function receives a shortened
string. This could be exploitable in some situations.
* lib.c (int_str): Use utf8_dup_to_buf instead of
utf8_dup_to_uc. Pass 1 to have the buffer null-terminated,
since mp_read_radix depends on it.
* stream.c (make_string_byte_input_stream): Use
utf8_dup_to_buf. This gives us the size, soo we don't have to
call strlen. The buffer is no longer null terminated, but the
byte input stream implementation never relied on this.
* utf8.c (utf8_from_buf): Replacement fors utf8_from_uc
which doesn't assume that the buffer of bytes is
null-terminated. It can produce a wide string containing
U+DC00 characters corresponding to embedded nulls in the
original buffer.
(utf8_from): Calculate length of null-terminated string and use
utf8_from_buf.
(utf8_to_buf): Replacement for utf8_to_uc. Can produce
a buffer which is or is not null-terminated, based on new
argument.
(utf8_to): Use utf8_to_buf, and ask it to null-terminate,
thus preserving behavior.
(utf8_dup_from_uc): This function was not used anywhere
and is removed.
(utf8_dup_to_buf): Replacement for utf8_dup_to_uc which
takes an extra agrgument, whether to null-terminate
or not.
(utf8_dup_to): Apply security check here: is the resulting
string as long as utf8_to says it should be? If not,
it contains embedded nulls. Throw an exception.
* utf.h (utf8_from_uc, utf8_to_uc, utf8_dup_from_uc,
utf8_dup_to_uc): Declarations removed.
(utf8_from_buf, utf8_to_buf, utf8_dup_to_buf): Declared.
Diffstat (limited to 'parser.l')
0 files changed, 0 insertions, 0 deletions