[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#305831: mbrtowc() fails for vi_VN.tcvn



On Mon, Apr 25, 2005 at 01:07:54AM +0900, GOTO Masanori wrote:
> At Fri, 22 Apr 2005 12:14:53 +0100,
> Ross Paterson wrote:
> > According to the spec, mbrtowc(&wc, buf, 1, &st) should either return 1
> > and set wc, or return 0, (size_t)-1 or (size_t)-2.  In this locale it
> > returns either 0 or 1, but doesn't always set wc in the latter case,
> 
> It works OK when I changed this source as follows.

Sorry, the subject line was a bit broad -- I didn't mean to imply any
more than a failure in this specific usage pattern.

> > (In iconvdata/tcvn5712-1.c, this decoding is treated as stateful, but
> > I don't think it should be.)
> 
> It has five combined character:
> 
>     http://www.informatik.uni-leipzig.de/~duc/software/misc/tcvn.txt
> 
> TCVN5712:1993 is very weird encodings, because 0xb0..0xb4 are
> postposing combined character.  This means even if we read the first
> character, we cannot decide output character until we read the 2nd
> character.

I know -- I just thought that one could have, e.g.

	mbrtowc(&wc, "a ", 1, &st)  return  (size_t)-2
	mbrtowc(&wc, "a ", 2, &st)  return  1

(though getwc would have to push the extra byte back onto the stream,
I guess)

Just wishlist, of course, stateless encodings are easier to work with,
and this is the only stateful one in Debian.



Reply to: