[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8 locales

Bernd Eckenfels writes:
> Afaik UTF8 is not able to encode 32bit unicode? I thought this is because
> the "living" languages are all restricted to 16bit? Hmm... i might be wrong.
> Does that mean Java does not support asian languages with its 16bit Unicode?

UTF-8 can be used encode UCS-4.

> As I understand it, all living languages are contained in the "not-extended"
> 16bit set. No?

Not at all. Ideographic Extension Block B (which will be part of the
upcoming Unicode 3.1/ISO 10646-2 release) contains Han characters that
are used in Hong Kong, Taiwan, and other locales. For example, the
Hong Kong Supplementary Character Set (HKSCS) adds several thousand
characters to Big Five and Unicode. They define mappings to the Big 5
EUDC and the PUA of Unicode. Ideographic Extension Block A (added in
Unicode 3.0) includes some of the HKSCS code points, but not all. So
you end up with separate mapping tables for Unicode 2.x and 3.0
because they contain different PUA mappings.

Once IEB-B is released all of HKSCS can be encoded in Unicode/ISO
10646 without resorting to the PUA.


Tom Emerson                                          Basis Technology Corp.
Zenkaku Language Hacker                            http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"

Reply to: