Re: UTF-8 locales
At Mon, 20 Nov 2000 01:11:02 -0700,
Anthony Fok <firstname.lastname@example.org> wrote:
> To add to that list, China has the new GB18030-2000 standard
> (locale zh_CN.GB18030) which also contains many characters beyond Unicode.
Interesting. I will have to mention it in my "Introduction to I18N"
document in Debian Documentation Project. (Now under grand rewriting).
Please check http://www.debian.org/doc/manuals/intro-i18n/
BTW, I think GB18030 would be a _character set_, not _encoding_.
If so, we won't have zh_CN.GB18030 locale.
JIS X 0201, JIS X 0208, JIS X 0212, JIS X 0213 are _character set_.
EUC-JP, Shift-JIS, ISO-2022-JP are _encoding_.
For simplified Chinese:
GB 2312, GB 7589, GB 7590, GB 8565, GB 12052, GBK, are _character set_.
CN-GB (aka EUC-CN), GBK, ISO-2022-CN, are _encoding_.
For traditional Chinese:
BIG5, CNS 11643, are _character set_.
ISO-2022-CN, ISO-2022-CN-EXT, EUC-TW, BIG5, are _encoding.
Codes which are not ISO2022-compliant tend not to separate
_character set_ and _encoding_.
> Very much so in Chinese. In fact, the Chinese government has gone as far as
> to ban the sale of any Chinese software that only supports Unicode starting
> in 2001. All new Chinese software must support the GB18030-2000 character
> set. And yes, Microsoft will have to comply too; their current Unicode-only
> solution won't work. (Ho ho ho!) Apparently, the Chinese government is
> somewhat displeased to have the Chinese language controlled and *limited*
> by an International Consortium like Unicode. There are *so* many Chinese
> characters that aren't in the 16-bit Unicode that it would create lots of
> trouble if Unicode were to become the de-facto standard in China.
> GB18030-2000 is compatible with ISO-10646 AFAIK.
How severe! Can a government have such a right?
However, this sounds nice also for Japanese people. Softwares on
POSIX systems will use locale and wide characters instead of Unicode
and UTF-8, since this is the easiest way to support both of GB18030
and UTF-8. And UNIX vendors will work hard to support locale mechanisms.
Then, usage of locale and wide characters concludes into support of
encodings such as EUC-JP, ISO-2022-JP, Shift-JIS, and so on.
I will be right, _if GB18030 won't included in Unicode_. However, I
think GB18030 will be included in Unicode in future, if GB18030 is a
character set, not an encoding.
> Similar concerns are in Taiwan, and indeed many characters are only in
> CNS11643 (and ISO-10646) but not in Unicode.
> Of course, these are mostly heresay. I don't know the details, as I was
> originally from Hong Kong, and I have been living in Canada for over 10
> years. But speaking of Hong Kong, there are quite a few Chinese characters
> added by the HKSAR government that won't be in Unicode either. So yeah,
> though I am bemused, I am kind of glad that the Chinese government take such
> a strong stance to force software support the new GB18030-2000 standard,
> which, like ISO-10646, has space for millions of characters. :-)
ISO-10646 and Unicode share exactly the same character set and will
do also in future, though the width of code space is different
(ISO-10646: 31bit, Unicode: 0x000000 - 0x10ffff [a bit more than
I suppose you misunderstand that Unicode is 16bit, though it is true
that Unicode (1.0) _was_ 16bit.
Tomohiro KUBOTA <email@example.com>