[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Is there anybody interested in supporting GB18030 in debian?



On Sat, 20 Jan 2001, Anthony Fok wrote:
> On Sat, Jan 20, 2001 at 03:27:03PM +0800, ha shao wrote:
> > On Sat, Jan 20, 2001 at 09:17:10AM +0800, zw@debian.org wrote:
> > > As rumor said, GB18030 maybe is just on the road to Unicode... So somebody,
> > > mark-ably the author of Unicode-Linux FAQ, doesn't have much interests to let
> > > XFree86 see GB18030. And we know he's a strong vocal in the XFree86 world. ;)
> > 
> > We have to bring our users usable environment. If mark-ably doesn't like
> > it, we will roll in ourselves. GB18030 is mandatory in mainland. If it
> > is not supported, mainland user won't use it or think it is not complete.
> 
> Agreed.  Besides, even I may not agree with the Communist government, I
> highly commend their actions for enforcing their GB18030 standard.  This
> way, at least one good thing that has come out, whether related or not, is
> that Unicode 3.1 added a whopping 40,000+ CJK characters.  :-)  We Chinese
> and our language are not going to pushed around by foreign programmers who
> only want convenience and won't worry about the incompleteness of the
> CJK charset in an international standard.  Competition is good.  ;-)

Plus don't forget China's early role in putting together the original
~20,000.  See Appendix A "Han Unification History"[1] in the online
version[2] of Unicode 3.0 (don't forget to look at the errata[3] too),
particularly the parts about GB13000.1 and URO.  Who knows what we'd have
today otherwise?[4]

[1] http://www.unicode.org/unicode/uni2book/appA.pdf
[2] http://www.unicode.org/unicode/uni2book/u2.html
[3] http://www.unicode.org/unicode/uni2errata/UnicodeErrata.html
[4] Probably some ISO 2022 thing... yuck.

But something not mentioned much is that is because of a fictional
extension to GB12345 (fanti version of GB2312) submitted for the original
~20,000 (U+4E00 .. U+9FAF), Unicode has the 50+ Cantonese characters[6] it
does in Unicode 2.1.  They are also in GBK, published with GB13000.1.
A further side-effect is that Big5+ includes them too, because of
influence from Unicode. :)

I extracted the data from the unihan.txt file, and made a list[5].  You'll
see they are in rows that are not officially in GB12345.  (Ignore the
Korean ones at the bottom of the list; some are not really characters,
which I think was a mistake.)

[5] http://deall.ohio-state.edu/grads/chan.200/cjkv/gb12345add.html
  (wait a while please, I think the web server is down for the moment)
[6] This doesn't mean they are all in there; CJK Extension A introduced
  with Unicode 3.0 adds some, and CJK Extension B in the forthcoming
  Unicode 3.1 will add more.  e.g., lip 'elevator, lift' is in there
  as U+28319.

But going back to the 40,000+ new hanzi to be added in CJK Extension B for
the forthcoming Unicode 3.1, it should be noted the role that the
mainland _Hanyu Da Zidian_ dictionary has played.  It is the third
dictionary used for sorting, after the Japanese[7] _Dai Kanwa Jiten_ ("Da
Hanhe Cidian" in Chinese; "Morohashi" in the Western world) which is #2,
and the _Kangxi Zidian_ which is #1.  See chapter 7 "East Asian
Scripts"[8].

In the "qianyan" (foreword) of the dictionary, it notes that in 1975 Zhou
Enlai and Deng Xiaoping sanctioned a group to put together the _Hanyu Da
Zidian_.  Now, many of the 40,000+ hanzi going into the forthcoming
Unicode 3.1[9] are coming straight out of it![10]

[7] Even though it is from Japan, Morohashi Tetsuji was a scholar of
  Chinese, but he couldn't find any good dictionaries, so he spent the
  rest of his life writing and revising one.  So it is really more of
  a dictionary about (classical) Chinese than Japanese.
[8] http://www.unicode.org/unicode/uni2book/ch10.pdf
[9] http://www.unicode.org/unicode/reports/tr27/
[10] http://www.cse.cuhk.edu.hk/~irg/irg/N758_CJK_B_PreDIS-R1_CoverNote.pdf

I shouldn't forget to mention the companies "Beijing Zhong Yi (Zheng Code)
Electronics Company" and "Peking University Founder Group Corporation" are
mentioned in the Unicode 3.0 book as supplying fonts, as well as a list of
other names.  (So someone out there must have a font with CJK Extension A
in it...)

Thomas Chan
tc31@cornell.edu

-- 
| This message was re-posted from debian-chinese-big5@lists.debian.org
| and converted from big5 to gb2312 by an automatic gateway.



Reply to: