[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: questions on webwml/english/templete/debian/cdimage.wml



Tomohiro KUBOTA:

> Because the algorithm transliterations is not very good.

I know.

> And, many people in the world have to use a small subset of softwares
> only because such softwares support their native languages.

We're talking about the web pages here, the only software that need
Unicode support here are the browsers, and most of them do have it (at
varying degrees).

> Oh, very good.  Please note that east Asian will need not only display
> support but also input support, i.e., XIM support.

Yes, I'm very aware of that as well (although my direct experience with
IMs is limited). I have worked with the Unicode-adaption of our browser
for over a year.

> (note there is a rival; ISO-2022 is a multilingual encoding scheme
> with much longer history).

Yeah, and it's a mess, to be honest. This kind of "state-driven" (for
lack of a better word) encodings where you cannot easily sync (as you
can with UTF-8) is not something I like (the same goes for HZ, which is
just a "simplified" form of ISO-2022).

> However, for _one_ language (most of Debian web pages are written in
> one language, with a small portion of links to other languages),
> usage of legacy encodings is better, because of plenty of supporting
> softwares, fonts, and so on, so far.

We only need to support one kind of software, and that is web browsers.
When it comes to fonts, the underlying encoding of the document should
*really* have no say in what fonts is used to display the contents
(even though I am aware that Netscape 4 does such evil things). Having
the underlying encoding as Unicode makes it easier in a lot of cases
(no need to transcode) and makes it possible to interchange content
between the languages (for example when writing names of people,
companies or places, just see what we need to do to the things that are
included from .data files on this website, we need to use entities
whenever there is a non-ASCII character).

> I am also wrestling with a problem that Unicode doesn't have a
> relyable mapping table from/to Japanese legacy encodings.

That's because of some poor design of the legacy encodings, not
Unicode, with multiple mappings of some characters.

> See http://www.debian.or.jp/~kubota/unicode-symbols.html for detail.

Yes, I have read similar reasoning before.

However, many of the problems you are describing are caused by the
legacy encodings, not by Unicode. Unicode tries to solve the problems
by defining one unambigous encoding, whereas there today are several
ambigous legacy encodings. Like the "backslash vs. yen" problem of
Shi(f)t-JIS vs EUC-JP/ISO-2022-JP. Boy, is that a headache to
implement!

Also, the width issue is really a non-issue when it comes to graphical
systems, with its proportional fonts. And it was a hack to start with
(two encoding bytes = double display-width (which even doesn't hold
true for EUC-JP, for instance, with half-width characters of two bytes
(half-width kana) and double-width characters of four bytes (the SS1
set)).

-- 
\\//
peter - http://www.softwolves.pp.se/

  I do not read or respond to mail with HTML attachments.
  Statement concerning unsolicited e-mail according to Swedish law:
  http://www.softwolves.pp.se/peter/reklampost.html



Reply to: