Bug#157086: libc6: mbrtowc bug with incomplete wide characters

To: "Debian Bug Tracking System" <submit@bugs.debian.org>
Subject: Bug#157086: libc6: mbrtowc bug with incomplete wide characters
From: "Daniel Jacobowitz" <dan@debian.org>
Date: Sat, 17 Aug 2002 16:26:29 -0400
Message-id: <[🔎] E17gA9W-0002kl-00@nevyn.them.org>
Reply-to: "Daniel Jacobowitz" <dan@debian.org>, 157086@bugs.debian.org

Package: libc6
Version: 2.2.5-13
Severity: normal
Tags: upstream

Some characters (in the Thai character?) set can not be resumed if they are
partially parsed.  The problem can be reproduced by (LC_ALL set to
en_US.UTF-8):

char *str4 = "\xe0\xb8\xb1";
int bar(char *str)
{ 
  mbstate_t ps;
  wchar_t wc;
  int j;
  memset (&ps, 0, sizeof(ps));
  ps.__value.__wch = 3584;
  j = mbrtowc (&wc, str, 1, &ps);
  j = mbrtowc (&wc, str+1, 2, &ps);
  return j;
}
int main(int argc, char **argv, char **env)
{ 
  setlocale(LC_ALL, "");

  bar(str4);
}

The character parses correctly from that shift state if the whole string is
given at once:

(gdb) p ps
$8 = {__count = 0, __value = {__wch = 3584, __wchb = "\0\016\0"}}
(gdb) p mbrtowc(&wc, str, 3, &ps)
$9 = 3
(gdb) p ps
$10 = {__count = 0, __value = {__wch = 3584, __wchb = "\0\016\0"}}
(gdb) p mbrtowc(&wc, str, 1, &ps)
$11 = -2
(gdb) p mbrtowc(&wc, str+1, 2, &ps)
$12 = -1


I don't know the sequence to reach that shift state, but it's in M. Kuhn's
UTF-8-demo.txt file, a standard UTF-8 test.

-- System Information:
Debian Release: testing/unstable
Architecture: i386
Kernel: Linux nevyn 2.4.19-pre10-ac2-drow #4 SMP Sun Jun 16 12:01:20 EDT 2002 i686
Locale: LANG=en_US, LC_CTYPE=

-- no debconf information

Reply to:

Prev by Date: Processed: oops
Next by Date: A missing link in libc2.2.15-13?
Previous by thread: Processed: oops
Next by thread: A missing link in libc2.2.15-13?
Index(es):
- Date
- Thread