[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8 locales



On Mon, Nov 20, 2000 at 07:25:11PM +0900, Tomohiro KUBOTA wrote:
> 
> BTW, I think GB18030 would be a _character set_, not _encoding_.
> If so, we won't have zh_CN.GB18030 locale.

In fact it is both, AFAICT; GB18030 defines the set of characters, and
the way to encode them.  Just like GBK.

> Examples (Japanese):
>    JIS X 0201, JIS X 0208, JIS X 0212, JIS X 0213 are _character set_.
>    EUC-JP, Shift-JIS, ISO-2022-JP are _encoding_.
> For simplified Chinese:
>    GB 2312, GB 7589, GB 7590, GB 8565, GB 12052, GBK, are _character set_.
>    CN-GB (aka EUC-CN), GBK, ISO-2022-CN, are _encoding_.
> For traditional Chinese:
>    BIG5, CNS 11643, are _character set_.
>    ISO-2022-CN, ISO-2022-CN-EXT, EUC-TW, BIG5, are _encoding.
> 
> Codes which are not ISO2022-compliant tend not to separate
> _character set_ and _encoding_.

You might want to add HKSCS to that list :p it defines both the set of
characters to be used in Hong Kong, and the way to encode them in both
Big5 and ISO-10646.  (Though as others have pointed out, currently a
whole bunch of characters in HKSCS are mapped to both the PUA and plane
2 of ISO-10646 version 2, including some of the expletives widely used
in Hong Kong ... :p )

[ regarding PRC Govt's ban of non-GB18030 compliant software ]
> How severe!  Can a government have such a right?

Yup, and the deadline's only a bit more than a month away ...

-- 
  Roger So                                            telnet://e-fever.org
  spacehunt at e-fever dot org                          SysOp, e-Fever BBS
  GnuPG  1024D/98FAA0AD  F2C3 4136 8FB1 7502 0C0C 01B1 0E59 37AC 98FA A0AD



Reply to: