Re: UTF-8 locales

To: debian-i18n@lists.debian.org
Cc: debian-devel@lists.debian.org
Subject: Re: UTF-8 locales
From: Tomohiro KUBOTA <tkubota@riken.go.jp>
Date: Mon, 20 Nov 2000 11:15:57 +0900
Message-id: <[🔎] 8766ljfkwy.wl@surfchem0.riken.go.jp>
In-reply-to: In your message of "Sun, 19 Nov 2000 22:50:54 +0100" <[🔎] 20001119225054.A14582@lina.inka.de>
References: <[🔎] 87r94gqd2e.wl@surfchem0.riken.go.jp> <[🔎] 200011131854.DAA16802@smtp5.dti.ne.jp> <[🔎] 87u29928rd.wl@surfchem0.riken.go.jp> <[🔎] 20001116004510.A3138@debian.org> <[🔎] 20001116094026.A12204@daisy.vocalis.com> <[🔎] 20001118225558.A1180@debian.org> <[🔎] 20001118200111.A12372@x8b4e516e.dhcp.okstate.edu> <[🔎] 20001119225054.A14582@lina.inka.de>

Hi,

At Sun, 19 Nov 2000 22:50:54 +0100,
Bernd Eckenfels <lists@lina.inka.de> wrote:

> Afaik UTF8 is not able to encode 32bit unicode?

Strictly speaking, there is no 32bit unicode.  UCS-4 character set
has 31bit code space, not 32bit.  UTF-8 can encode the whole UCS-4.

> I thought this is because
> the "living" languages are all restricted to 16bit? Hmm... i might be wrong.

Taiwan CNS 11643 character set has about 47000 ideograms.
Recently, Japan came to have a new standard JIS X 0213.  Though I hope
an effort is being made to include them in Unicode, they won't be
included in BMP.  (BMP has about 28000 ideograms).

> As I understand it, all living languages are contained in the "not-extended"
> 16bit set. No?

No.

Though daily text in Japanese language does not need so many ideograms, 
proper nouns for person and place need to be expressed in correct 
characters.  This is why Japanese people need large character set.

I don't know about Chinese and Korean.

---
Tomohiro KUBOTA <kubota@debian.org>
http://surfchem0.riken.go.jp/~kubota/

Reply to:

Follow-Ups:
- Re: UTF-8 locales
  - From: Anthony Fok <foka@debian.org>

References:
- UTF-8 locales
  - From: Tomohiro KUBOTA <tkubota@riken.go.jp>
- Re: UTF-8 locales
  - From: GOTO Masanori <gotom@debian.or.jp>
- Re: UTF-8 locales
  - From: Tomohiro KUBOTA <tkubota@riken.go.jp>
- Re: UTF-8 locales
  - From: Nicolás Lichtmaier <nick@debian.org>
- Re: UTF-8 locales
  - From: Edmund GRIMLEY EVANS <edmundo@rano.org>
- Re: UTF-8 locales
  - From: Nicolás Lichtmaier <nick@debian.org>
- Re: UTF-8 locales
  - From: David Starner <dvdeug@x8b4e516e.dhcp.okstate.edu>
- Re: UTF-8 locales
  - From: Bernd Eckenfels <lists@lina.inka.de>

Prev by Date: need a NMU for dillo && x86 autobuilder problem
Next by Date: Re: devfs timestamp is wrong
Previous by thread: Re: UTF-8 locales
Next by thread: Re: UTF-8 locales
Index(es):
- Date
- Thread