Re: UTF-8 locales
On Mon, Nov 20, 2000 at 07:25:11PM +0900, Tomohiro KUBOTA wrote:
>
> BTW, I think GB18030 would be a _character set_, not _encoding_.
> If so, we won't have zh_CN.GB18030 locale.
In fact it is both, AFAICT; GB18030 defines the set of characters, and
the way to encode them. Just like GBK.
> Examples (Japanese):
> JIS X 0201, JIS X 0208, JIS X 0212, JIS X 0213 are _character set_.
> EUC-JP, Shift-JIS, ISO-2022-JP are _encoding_.
> For simplified Chinese:
> GB 2312, GB 7589, GB 7590, GB 8565, GB 12052, GBK, are _character set_.
> CN-GB (aka EUC-CN), GBK, ISO-2022-CN, are _encoding_.
> For traditional Chinese:
> BIG5, CNS 11643, are _character set_.
> ISO-2022-CN, ISO-2022-CN-EXT, EUC-TW, BIG5, are _encoding.
>
> Codes which are not ISO2022-compliant tend not to separate
> _character set_ and _encoding_.
You might want to add HKSCS to that list :p it defines both the set of
characters to be used in Hong Kong, and the way to encode them in both
Big5 and ISO-10646. (Though as others have pointed out, currently a
whole bunch of characters in HKSCS are mapped to both the PUA and plane
2 of ISO-10646 version 2, including some of the expletives widely used
in Hong Kong ... :p )
[ regarding PRC Govt's ban of non-GB18030 compliant software ]
> How severe! Can a government have such a right?
Yup, and the deadline's only a bit more than a month away ...
--
Roger So telnet://e-fever.org
spacehunt at e-fever dot org SysOp, e-Fever BBS
GnuPG 1024D/98FAA0AD F2C3 4136 8FB1 7502 0C0C 01B1 0E59 37AC 98FA A0AD
Reply to: