Bug#305831: mbrtowc() fails for vi_VN.tcvn
On Mon, Apr 25, 2005 at 01:07:54AM +0900, GOTO Masanori wrote:
> At Fri, 22 Apr 2005 12:14:53 +0100,
> Ross Paterson wrote:
> > According to the spec, mbrtowc(&wc, buf, 1, &st) should either return 1
> > and set wc, or return 0, (size_t)-1 or (size_t)-2. In this locale it
> > returns either 0 or 1, but doesn't always set wc in the latter case,
>
> It works OK when I changed this source as follows.
Sorry, the subject line was a bit broad -- I didn't mean to imply any
more than a failure in this specific usage pattern.
> > (In iconvdata/tcvn5712-1.c, this decoding is treated as stateful, but
> > I don't think it should be.)
>
> It has five combined character:
>
> http://www.informatik.uni-leipzig.de/~duc/software/misc/tcvn.txt
>
> TCVN5712:1993 is very weird encodings, because 0xb0..0xb4 are
> postposing combined character. This means even if we read the first
> character, we cannot decide output character until we read the 2nd
> character.
I know -- I just thought that one could have, e.g.
mbrtowc(&wc, "a ", 1, &st) return (size_t)-2
mbrtowc(&wc, "a ", 2, &st) return 1
(though getwc would have to push the extra byte back onto the stream,
I guess)
Just wishlist, of course, stateless encodings are easier to work with,
and this is the only stateful one in Debian.
Reply to: