[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: questions on webwml/english/templete/debian/cdimage.wml



At Sun, 13 Jan 2002 14:49:15 +0100 (CET),
peter karlsson wrote:
> 
> Tomohiro KUBOTA:
> 
> > Because the algorithm transliterations is not very good.
> 
> I know.
> 
> > And, many people in the world have to use a small subset of softwares
> > only because such softwares support their native languages.
> 
> We're talking about the web pages here, the only software that need
> Unicode support here are the browsers, and most of them do have it (at
> varying degrees).
> 
> > Oh, very good.  Please note that east Asian will need not only display
> > support but also input support, i.e., XIM support.
> 
> Yes, I'm very aware of that as well (although my direct experience with
> IMs is limited). I have worked with the Unicode-adaption of our browser
> for over a year.
> 
> > (note there is a rival; ISO-2022 is a multilingual encoding scheme
> > with much longer history).
> 
> Yeah, and it's a mess, to be honest. This kind of "state-driven" (for
> lack of a better word) encodings where you cannot easily sync (as you
> can with UTF-8) is not something I like (the same goes for HZ, which is
> just a "simplified" form of ISO-2022).

Note that browsers cannot be free from "state" even if they use Unicode.
For example, rendering of Unicode unified CJK Han Ideographs (which are
logically same character from a certain point of view but large part of
them have significantly different glyphs) needs "state" of "language".

Thus, though it is true ISO-2022 is very complex, please note Unicode
is not so simple.  If Unicode were less simpler than human natural
languages, it means that Unicode has defects.


> > I am also wrestling with a problem that Unicode doesn't have a
> > relyable mapping table from/to Japanese legacy encodings.
> 
> That's because of some poor design of the legacy encodings, not
> Unicode, with multiple mappings of some characters.

Never.  Before appearance of Unicode, these encodings were identical,
except for small number of private additional characters.  For example,
Shift_JIS and CP932 is identical if we don't think about conversion
to/from Unicode.  Most Japanese people even don't know the name of
"CP932" and they think they are using Shift_JIS.  What they think
is correct.  However, when Unicode comes, it stated "what you are
using with Windows is CP932, not Shift_JIS."  Unicode is the origin
of this confusion by introducing many legacy encodings into Japan.

(I am saying about the chapter of "Conversion tables differ between
venders" in my document
http://www.debian.or.jp/~kubota/unicode-symbols.html .)

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/



Reply to: