From 3b64319b10196425401d4d71f7ee1273e3bffe32 Mon Sep 17 00:00:00 2001 From: Kaz Kylheku Date: Sat, 15 Feb 2014 00:19:15 -0800 Subject: A trivial change in the UTF-8 decoder allows TXR to handle null bytes in text. * utf8.h (UTF8_ADMIT_NUL): New preprocessor symbol. (utf8_decoder): New member, flags. * utf8.c (utf8_decoder_init): Initialize flags to 0. (utf8_decode): If a null byte is encountered in the input, then convert it to 0xDC00, rather than keeping it as zero, unless flags contains UTF8_ADMIT_NUL. * txr.1: Document handling of null bytes. --- utf8.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'utf8.h') diff --git a/utf8.h b/utf8.h index c4915488..67dee69a 100644 --- a/utf8.h +++ b/utf8.h @@ -35,8 +35,11 @@ unsigned char *utf8_dup_to_uc(const wchar_t *); enum utf8_state { utf8_init, utf8_more1, utf8_more2, utf8_more3 }; +#define UTF8_ADMIT_NUL 1 + typedef struct utf8_decoder { enum utf8_state state; + int flags; wchar_t wch, wch_min; int head, tail, back; int buf[8]; -- cgit v1.2.3