Re: CJK workers, throw off you chains! (fwd)

Hello Thomas,
On Thu, Feb 22, 2001 at 11:14:41AM -0500, Thomas Chan wrote:
> Well, there are problem areas, such as none of the Plane 2 characters
> having any radical or stroke information, or volume/page references to
> large dictionaries.

Wow, that is a daunting job, especially given that the deadline is imminent.
On the other hand, I believe the IRG at least should have all the radical
and/or stroke information, because that is what they base upon to order the
characters. There's pretty good chance that they have other informations as
well. It'll be interesting to know why they are not presented in unihan 3.1

> I don't know who else is interested in CCCII mappings, but there isn't
> much time--3.1 will be released at the end of March.  CCCII mappings are
> also problematic, because source separation has been abandoned.  (You can

Do you know why it was abandoned. Is it believed that CCCII has been covered
by the combination of other charsets?

I personally am very interested to see how the 70195 han characters in
unicode 3.1 compare out with 75684 [1] in CCCII. Given that CCCII contains
a lot variants, there's good possibility that unicode already has more
hanzi than CCCII. It'll be interesting to see which CCCII codes are not
covered yet.

Although not exactly a fan of CCCII, I admire its well thoughted design.
It will be useful to have a mapping between CCCII and unicode. A CCCII
to CNS mapping will help some in this regard. Does anyone know such mapping

> > #                       TF      CNS 11643-1992, plane 15
> This is a good example of something that needs to be proofed--does plane
> 15 really exist in the 1992 edition of CNS 11643, or only the 1986
> edition?

Indeed a problematic area. CNS 11643-1992 says that plane 15 is for private
use. Yet not all the characters in CNS 11643-1986 plane 15 got into planes
1-7 of CNS 11643-1992. Which leads to my wild guesses:

1. Plane 15 of CNS 11643-1992 was added without announcement, or
2. Plane 15 is a fictitious extension added purely for the purpose of 
   submitting these characters to ISO/IRG.

Whether CNS 11643-1992 plane 15 is the same as CNS 11643-1986 plane 15 is
another question. I checked the first five characters in CNS 11643-1986
plane 15 (according to Ken Lunde's "CJKV Information Processing" Appendix
G), comparing with the char table from FCD version of ISO 10646-2 (May 2000,
does anyone know any newer version available?), using the mapping from
unihan 3.1 beta. The result is: the first 2 match, next three do not. This
could mean:

1. The mappings in Unihan 3.1 beta are incorrect, or
2. The char table of ISO 10646-2 FCD is incorrect, or
3. CNS 11643-1992 plane 15 is different from CNS 11643-1986 plane 15.

If anyone knows somebody in CNS committee, please ask them.

[1] There are some ambiguity about how many characters are encoded in CCCII.
    According to Ken Lunde's CJKV Information Processing", the formal release
    version has 53940 hanzi, while the draft version contains 75684 which is
    the number I quoted. The book was published in 1999.


