Re: UTF-8 locales


At Sun, 19 Nov 2000 22:50:54 +0100,
Bernd Eckenfels <lists@lina.inka.de> wrote:

> Afaik UTF8 is not able to encode 32bit unicode?

Strictly speaking, there is no 32bit unicode.  UCS-4 character set
has 31bit code space, not 32bit.  UTF-8 can encode the whole UCS-4.

> I thought this is because
> the "living" languages are all restricted to 16bit? Hmm... i might be wrong.

Taiwan CNS 11643 character set has about 47000 ideograms.
Recently, Japan came to have a new standard JIS X 0213.  Though I hope
an effort is being made to include them in Unicode, they won't be
included in BMP.  (BMP has about 28000 ideograms).

> As I understand it, all living languages are contained in the "not-extended"
> 16bit set. No?


Though daily text in Japanese language does not need so many ideograms, 
proper nouns for person and place need to be expressed in correct 
characters.  This is why Japanese people need large character set.

I don't know about Chinese and Korean.

Tomohiro KUBOTA <kubota@debian.org>

