[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

GB18030 summary and issues (fwd)



Hi all,

This may of be interest to some people.  It's an English-language summary
of GB18030 by Dirk Meyer of Adobe.

Looks like it basically makes GBK catch up with Unicode 3.0 by adding a
4-byte extension.


Thomas Chan
tc31@cornell.edu


---------- Forwarded message ----------
Date: Fri, 13 Oct 2000 09:57:00 -0800 (GMT-0800)
From: Markus Scherer <markus.scherer@jtcsv.com>
To: Unicode List <unicode@unicode.org>
Subject: GB18030 summary and issues

Dear Uni-encoders and -decoders,

Dirk Meyer from Adobe has put together an extensive summary of the chinese GB 18030 encoding standard that was published on 2000-mar-17. Ken Lunde and I assisted Dirk with reviews and comments.

The summary is on the web site of Ken's famous CJKV book "with the fish":
ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf

To summarize the summary, we now have an english text describing the new encoding in its details. There are a few apparent errors, typos, and inconsistencies in the chinese standard text that need to be resolved.

For implementers, there is enough information in the summary to describe the encoding structure and to prepare an implementation.

What is still missing - aside from the resolution of the issues mentioned here - is a precise mapping table for how to map between at least the one-byte and two-byte portions of GB 18030 to and from Unicode.
In theory, it should be almost the same as GBK, but to be sure, we need precise, complete, and machine-readable mappings.
Given the one-byte and two-byte portions and the description in the standard and in the summary, the four-byte portion can be derived with a little bit of Perl or similar.

Anyone who needs to implement or know about GB 18030 should probably read this text.

Anyone who can contribute precise mapping tables and/or can help resolving the open issues please do so.


Best regards,

markus





Reply to: