Needed works to use Unicode

To: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
Cc: debian-devel@lists.debian.org
Subject: Needed works to use Unicode
From: Tomohiro KUBOTA <tkubota@riken.go.jp>
Date: Tue, 19 Feb 2002 22:22:23 +0900
Message-id: <[🔎] 200202191317.g1JDHba11855@side.riken.go.jp>
In-reply-to: <[🔎] 20020219101403.GA18139@melkor.dnp.fmph.uniba.sk>
References: <[🔎] 87r8nkspu0.fsf@becket.becket.net> <[🔎] 20020216213033.B5250@lightbearer.com> <[🔎] 20020217045039.GB1169@dodds.net> <[🔎] 20020217131324.D6750@justice.loyola.edu> <[🔎] 20020217190158.GD1393@dodds.net> <[🔎] 20020218124045.GA1552@wonderland.linux.it> <[🔎] 20020218154531.GB2200@zombie.inka.de> <[🔎] 20020218165504.B17989@khazad-dum> <[🔎] 20020218202847.GA458@celeron.dekkers> <[🔎] 20020218180536.C17989@khazad-dum> <[🔎] 20020219101403.GA18139@melkor.dnp.fmph.uniba.sk>

Hi,

At Tue, 19 Feb 2002 11:14:03 +0100,
Radovan Garabik wrote:

> this has already been discussed to _death_ here and everywhere.
> As long as the Japanese users do not mix chinese and korean, UTF-8
> is adequate for them (heck, windows 2000/XP works internally
> in unicode, and just translates output to other legacy Japanese
> encodings for communication with outer world).
> In this sense, using UTF-8 is no worse than using current japanese
> encodings.

Right, because Unicode BMP is a superset of EUC-JP or Shift_JIS.
However, we need a few fixes to use Unicode widely.

1. We need unique authorized mapping table between Unicode and EUC-JP.
2. Japanese people need Japanese-style fonts for CJK Ideograph, not
   Chinese or Korean style.  This is easily achieved for "localized"
   softwares such as Windows (because Japanese version of Windows can
   be released with Japanese font), but there are no apparently easy
   way to achieve that for global softwares such as Debian.
3. EastAsianWidth (Unicode Standard Annex #11) is not good enough for
   keeping compatibility with CJK existing softwares.

I am now asking Unicode Consortium to fix these problems.
It might be difficult to solve 1 because the problem is caused by
disagreement between major vendors (which are members of Unicode
Consortium).  I expect a solution for 2 will be available in future
as "variant tags" which will be introduced into Unicode 3.2, though
these variant tags cannot be applied to CJK Ideogram yet.  The problem
3 might be difficult because the problem is closely related with
the problem 1.  Read explanation in

http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/

for detail.

However, I think Unicode will be useful for Japanese if these problems
will be fixed.  The problem is, I am almost nearly losing hope because
the problem of 1 is strongly political problem.

> However, just mentioning the word "unicode" to a Japanese user
> carries a lot of negative emotional presumptions and will be met
> with strong criticism.

Sure.  Some Japanese people are emotional denying Unicode.  I don't
think we should take care of these people.

> > Ask our resident Japanese i18n especialists what they would think of a
> > proposal of UTF8 being the default charset for every locale.  The answers
> > should at the very least give you an idea of the amount of effort it would
> > take to get such a useless proposal through.
> 
> useless?
> UTF-8 is currently the only way I can efficiently communicate in.
> And it is the only encoding in which there are czech/slovak characters
> and EURO sign (iso-8859-17 anyone?).

UTF-8 being the default charset for every locale would be nice, however,
it is not now.  In future.

Besides problems which is related to Unicode itself, there are problems
we have to solve before we will use UTF-8 widely.

a. So far there are relatively little number of softwares which
   support UTF-8.  Though we can sometimes manage to use 8bit-
   encoding-only softwares for CJK legacy encodings, it is generally
   more difficult to use such softwares with UTF-8.  (For example,
   bash doesn't support multibyte encodings at all.)

b. Even though there are softwares which is advertised to support Unicode,
   many of them support only a small subset of Unicode which doesn't
   include Japanese.  (For example, "Unicode support" of Linux console
   can use only a few hundreds of characters.)

c. Even Unicode (including CJK)-supporting softwares sometimes lack
   input support for CJK languages.  (Since CJK people use thousands
   of characters, usual keyboard map for European languages cannot be
   used.)  (For example, Yudit lacks XIM support.)

We have to fix these problems before we will use UTF-8 widely.

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/

Reply to:

References:
- Re: Debian doesn't have to be slower than time.
  - From: tb@becket.net (Thomas Bushnell, BSG)
- Re: Debian doesn't have to be slower than time.
  - From: Joel Baker <lucifer@lightbearer.com>
- Re: Debian doesn't have to be slower than time.
  - From: Steve Langasek <vorlon@netexpress.net>
- Re: Debian doesn't have to be slower than time.
  - From: Michael Stone <mstone@debian.org>
- Re: Debian doesn't have to be slower than time.
  - From: Steve Langasek <vorlon@netexpress.net>
- Re: Debian doesn't have to be slower than time.
  - From: Marco d'Itri <md@Linux.IT>
- Re: Debian doesn't have to be slower than time.
  - From: Eduard Bloch <edi@gmx.de>
- Re: Debian doesn't have to be slower than time.
  - From: Henrique de Moraes Holschuh <hmh@debian.org>
- Re: Debian doesn't have to be slower than time.
  - From: Jeroen Dekkers <jeroen@dekkers.cx>
- Re: Debian doesn't have to be slower than time.
  - From: Henrique de Moraes Holschuh <hmh@debian.org>
- Re: Debian doesn't have to be slower than time.
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>

Prev by Date: Re: diversion/conffile
Next by Date: Re: Bug#134658: ITP: lsb -- Linux Standard Base 1.1 core support package
Previous by thread: Re: Debian doesn't have to be slower than time.
Next by thread: Re: Debian doesn't have to be slower than time.
Index(es):
- Date
- Thread