[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#305831: mbrtowc() fails for vi_VN.tcvn



At Fri, 22 Apr 2005 12:14:53 +0100,
Ross Paterson wrote:
> According to the spec, mbrtowc(&wc, buf, 1, &st) should either return 1
> and set wc, or return 0, (size_t)-1 or (size_t)-2.  In this locale it
> returns either 0 or 1, but doesn't always set wc in the latter case,
> as the following test program shows.  I believe it should be returning
> (size_t)-2 (incomplete encoding) for (most) letters, and setting wc in
> all the other cases (except \0).

It works OK when I changed this source as follows.

--- test.c~     2005-04-23 11:45:25.000000000 +0900
+++ test.c      2005-04-23 12:17:11.000000000 +0900
@@ -7,7 +7,7 @@
 main() {
         int c;
         mbstate_t st;
-        char buf[1];
+        char buf[2];
         size_t size;
         wchar_t wc;
 
@@ -15,8 +15,9 @@
         for (c = 0; c <= 0xff; c++) {
                 wc = 0xbaad;
                 buf[0] = c;
+                buf[1] = '\0';
                 memset(&st, 0, sizeof(st));
-                size = mbrtowc(&wc, buf, 1, &st);
+                size = mbrtowc(&wc, buf, 2, &st);
                 printf("c = 0x%02x, size = %d, wc = U+%04X\n", c, size, wc);
         }
         return 0;

> (In iconvdata/tcvn5712-1.c, this decoding is treated as stateful, but
> I don't think it should be.)

It has five combined character:

    http://www.informatik.uni-leipzig.de/~duc/software/misc/tcvn.txt

TCVN5712:1993 is very weird encodings, because 0xb0..0xb4 are
postposing combined character.  This means even if we read the first
character, we cannot decide output character until we read the 2nd
character.  Historically they designed it to make stateless, however
it's complete stateful - it does not make any difference.

If we detect the input sequence is finished in intermediate state,
thus if the character is between U+0041 and 0+01b0, we output such
character to the stream.  However the current implementation does not
work correctly, it just stores to the internal buffer.  I'll
investigate this bug.

Regards,
-- gotom



Reply to: