Re: UTF-8 locales
At Thu, 16 Nov 2000 09:40:26 +0000,
Edmund GRIMLEY EVANS <firstname.lastname@example.org> wrote:
> > You are right... the i18n in Linux is not coming well, everybody seems to
> > implement their own scheme...
> > Besides, GNU having choosen a sizeof(wchar_t)==4 doesn't help to encourage
> > using libc's locale support... =/
Consumption of memory is less important than whether I can use my
daily encodings (EUC-JP, ISO-2022-JP, and so on) or canoot at all.
I didn't think of developers who hesitate to use wchar_t because of
its memory consumption. I cannot believe, since memory consumption
is too trifling problem compared with the problem whether a user
can use the software or not.
I will agree with developers who dare to hard-code UTF-8 instead of
wchar_t, if they abolish the support of 8bit (or 7bit) encoding by the
softwares which they develop. I mean, if they need their (European-
language speakers, in most cases) daily (i.e., 7 and 8bit) encodings
(i.e., if they don't abolish the support of 7 or 8bit encodings), why
do they choose not to support our daily encodings?
> If you are suggesting that sizeof(wchar_t) could be 2, then please
> explain what you think mbtowc(&wc, "\360\220\200\200", 4) should do in
> a UTF-8 locale, and why you think that would be easier for
We cannot assume anything on the concrete value of wchar_t variables.
If a certain system uses the UCS-2 as an internal expression of wchar_t,
that call of mbtowc() will fail. However, there can be a system whose
sizeof(wchar_t) is 2 and whose internal expression of wchar_t is not
UCS-2, which does not fail for such a mbtowc() call.
# Ok, such a system is not likely to exist. I wanted to say that
# UCS is not only candidate for internal expression of wchar_t.
# For example, it is likely there is a system whose wchar_t is
# Mule-like code, i.e., some bits for specifying a coded character
# set and other bits for code point in the character set.
FYI: "\360\220\200\200" in UTF-8 means u+10000.
Tomohiro KUBOTA <email@example.com>