[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8 locales

On Mon, Nov 20, 2000 at 11:15:57AM +0900, Tomohiro KUBOTA wrote:
> > I thought this is because
> > the "living" languages are all restricted to 16bit? Hmm... i might be wrong.
> Taiwan CNS 11643 character set has about 47000 ideograms.
> Recently, Japan came to have a new standard JIS X 0213.  Though I hope
> an effort is being made to include them in Unicode, they won't be
> included in BMP.  (BMP has about 28000 ideograms).

To add to that list, China has the new GB18030-2000 standard
(locale zh_CN.GB18030) which also contains many characters beyond Unicode.

> > As I understand it, all living languages are contained in the "not-extended"
> > 16bit set. No?
> No.
> Though daily text in Japanese language does not need so many ideograms, 
> proper nouns for person and place need to be expressed in correct 
> characters.  This is why Japanese people need large character set.
> I don't know about Chinese and Korean.

Very much so in Chinese.  In fact, the Chinese government has gone as far as
to ban the sale of any Chinese software that only supports Unicode starting
in 2001.  All new Chinese software must support the GB18030-2000 character
set.  And yes, Microsoft will have to comply too; their current Unicode-only
solution won't work.  (Ho ho ho!)  Apparently, the Chinese government is
somewhat displeased to have the Chinese language controlled and *limited*
by an International Consortium like Unicode.  There are *so* many Chinese
characters that aren't in the 16-bit Unicode that it would create lots of
trouble if Unicode were to become the de-facto standard in China. 
GB18030-2000 is compatible with ISO-10646 AFAIK.

Similar concerns are in Taiwan, and indeed many characters are only in
CNS11643 (and ISO-10646) but not in Unicode.

Of course, these are mostly heresay.  I don't know the details, as I was
originally from Hong Kong, and I have been living in Canada for over 10
years.  But speaking of Hong Kong, there are quite a few Chinese characters
added by the HKSAR government that won't be in Unicode either.  So yeah,
though I am bemused, I am kind of glad that the Chinese government take such
a strong stance to force software support the new GB18030-2000 standard,
which, like ISO-10646, has space for millions of characters.  :-)



Anthony Fok Tung-Ling                Civil and Environmental Engineering
foka@ualberta.ca, foka@debian.org    University of Alberta, Canada
   Debian GNU/Linux Chinese Project -- http://www.debian.org/zh/
Come visit Our Lady of Victory Camp -- http://www.olvc.ab.ca/

Reply to: