Re: UTF-8 locales

To: debian-i18n@lists.debian.org
Cc: debian-devel@lists.debian.org
Subject: Re: UTF-8 locales
From: Tomohiro KUBOTA <tkubota@riken.go.jp>
Date: Thu, 16 Nov 2000 20:21:26 +0900
Message-id: <[🔎] 871ywc2mdl.wl@surfchem0.riken.go.jp>
In-reply-to: In your message of "Thu, 16 Nov 2000 09:40:26 +0000" <[🔎] 20001116094026.A12204@daisy.vocalis.com>
References: <[🔎] 87r94gqd2e.wl@surfchem0.riken.go.jp> <[🔎] 200011131854.DAA16802@smtp5.dti.ne.jp> <[🔎] 87u29928rd.wl@surfchem0.riken.go.jp> <[🔎] 20001116004510.A3138@debian.org> <[🔎] 20001116094026.A12204@daisy.vocalis.com>

Hi,

At Thu, 16 Nov 2000 09:40:26 +0000,
Edmund GRIMLEY EVANS <edmundo@rano.org> wrote:

> >  You are right... the i18n in Linux is not coming well, everybody seems to
> > implement their own scheme...
> >  Besides, GNU having choosen a sizeof(wchar_t)==4 doesn't help to encourage
> > using libc's locale support... =/

Consumption of memory is less important than whether I can use my
daily encodings (EUC-JP, ISO-2022-JP, and so on) or canoot at all.

I didn't think of developers who hesitate to use wchar_t because of 
its memory consumption.  I cannot believe, since memory consumption
is too trifling problem compared with the problem whether a user
can use the software or not.

I will agree with developers who dare to hard-code UTF-8 instead of 
wchar_t, if they abolish the support of 8bit (or 7bit) encoding by the
softwares which they develop.  I mean, if they need their (European-
language speakers, in most cases) daily (i.e., 7 and 8bit) encodings
(i.e., if they don't abolish the support of 7 or 8bit encodings), why
do they choose not to support our daily encodings?

> If you are suggesting that sizeof(wchar_t) could be 2, then please
> explain what you think mbtowc(&wc, "\360\220\200\200", 4) should do in
> a UTF-8 locale, and why you think that would be easier for

We cannot assume anything on the concrete value of wchar_t variables.
If a certain system uses the UCS-2 as an internal expression of wchar_t,
that call of mbtowc() will fail.  However, there can be a system whose
sizeof(wchar_t) is 2 and whose internal expression of wchar_t is not
UCS-2, which does not fail for such a mbtowc() call.  

# Ok, such a system is not likely to exist.  I wanted to say that
# UCS is not only candidate for internal expression of wchar_t.
# For example, it is likely there is a system whose wchar_t is
# Mule-like code, i.e., some bits for specifying a coded character 
# set and other bits for code point in the character set.

FYI: "\360\220\200\200" in UTF-8 means u+10000.

---
Tomohiro KUBOTA <kubota@debian.org>
http://surfchem0.riken.go.jp/~kubota/

Reply to:

Follow-Ups:
- Re: UTF-8 locales
  - From: David Starner <dvdeug@x8b4e516e.dhcp.okstate.edu>

References:
- UTF-8 locales
  - From: Tomohiro KUBOTA <tkubota@riken.go.jp>
- Re: UTF-8 locales
  - From: GOTO Masanori <gotom@debian.or.jp>
- Re: UTF-8 locales
  - From: Tomohiro KUBOTA <tkubota@riken.go.jp>
- Re: UTF-8 locales
  - From: Nicolás Lichtmaier <nick@debian.org>
- Re: UTF-8 locales
  - From: Edmund GRIMLEY EVANS <edmundo@rano.org>

Prev by Date: Re: UTF-8 locales
Next by Date: Re: Supporting non-english (was: Re: Quiero saber si...)
Previous by thread: Re: UTF-8 locales
Next by thread: Re: UTF-8 locales
Index(es):
- Date
- Thread