[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Is there anybody interested in supporting GB18030 in debian?



Thomas Chan <thomas@atlas.datexx.com> writes:
> On 23 Jan 2001, zhaoway wrote:
> 
> > So, again, my question is, what does GB18030 provide to us, which
> > cannot be solved with UTF-8, or Unicode surrogates? (The current
> > version of Unicode is not perfect, I agree. But there're no fixed
> 
> Actually, I don't see much difference between surrogates in Unicode
> (especially UTF-16 encoding) and the four-byte-long codepoints in GB18030.

Unicode defines a sequence of logical characters. The actual numbers
Unicode standard uses to arrange the sequence is, just say so, not
critical in most cases. And Unicode is defined in a compatible and
evolvable way, that is.

UTF-8 is a way to encode the sequence number of Unicode code points
into an universally recognizable and distinct-able number by well
behaved I18N applications. In a compatible and evolvable way too, that
is.

UTF-8 could also be used to re-encode GB2312, Big5. And since UTF-8 is
very clever, there would be no difficulty to extend UTF-8 to more than
31 bits encodings. And indeed, nothing would prevent Unicode to grow
to include all kinds of logical characters more than that could be
permitted with more than 31 bits. (Yes, I have to agree this is over
optimistic and simplistic.) Only you need _convention_. So _join_ the
effort and make the convention with other parties on the _earth_,
instead of trying to bully it. That is called _selfish_, I guess.

So in the near future, we will hope people all around the world using
Unicode and UTF-8 (maybe with an implementation number). And we could
exchange our information without bothering the encoding stupidity.
Either you recognize the small graphics (glyph) for that character, or
not. You don't need to worry if a Chinese character is encoded with a
different encoding than what you're using trying to read it.

Back to the topic, GB2312 is acceptable, because it predated Unicode
and UTF-8. GB13000 is acceptable, because it's compatible with
Unicode, and you can indeed using UTF-8 encoding for it. GB18030 is
weird, whether it will be a definition for Logical characters, or it
will be an actual encoding seeing by applications?

In the first case, its only value is to _push_ Unicode. To _compete_
with Unicode is not good, because it will break inter-operability.
(Whee, I hate to see the competition of KDE and GNOME to come to two
different non-inter-operable component models!)

In the second case, it's just a plain disaster. Like to design a
bigger GB2312 after all these years.

So, here, back to the original topic, what's the purpose to push
GB18030 all around our inter-application communication? Chaos.

-- 
zhaoway

-- 
| This message was re-posted from debian-chinese-gb@lists.debian.org
| and converted from gb2312 to big5 by an automatic gateway.



Reply to: