Bug#99324: Default charset should be UTF-8

To: Raul Miller <moth@debian.org>
Cc: dancer@netfort.gr.jp, 99324@bugs.debian.org, garabik@melkor.dnp.fmph.uniba.sk
Subject: Bug#99324: Default charset should be UTF-8
From: Junichi Uekawa <dancer@netfort.gr.jp>
Date: Wed, 13 Jun 2001 10:49:07 +0900
Message-id: <[🔎] 20010613104907.76bcd519.dancer@netfort.gr.jp>
Reply-to: Junichi Uekawa <dancer@netfort.gr.jp>, 99324@bugs.debian.org
In-reply-to: <[🔎] 992363449.8741c7af@debian.org>
References: <20010531165843.A518@flower.cesarb> <20010531233007.D12862@kitenet.net> <[🔎] 20010601131743.B26998@melkor.dnp.fmph.uniba.sk> <[🔎] 20010601133112.B7815@cibalia.gkvk.hr> <[🔎] 20010601135837.D26998@melkor.dnp.fmph.uniba.sk> <[🔎] 20010601140928.A8167@cibalia.gkvk.hr> <[🔎] 20010601141220.A28618@melkor.dnp.fmph.uniba.sk> <[🔎] 20010604204421.149b9fca.dancer@netfort.gr.jp> <[🔎] 20010606112126.B14517@melkor.dnp.fmph.uniba.sk> <[🔎] 20010606204228.0f678b20.dancer@netfort.gr.jp> <[🔎] 992363449.8741c7af@debian.org>

Raul Miller <moth@debian.org> immo vero scripsit

> On Wed, Jun 06, 2001 at 08:42:28PM +0900, Junichi Uekawa wrote:
> > UCS4 is not a satisfactory encoding for our needs, unfortunately.
> > JIS is not comlpete either, but UCS4 is less.
> 
> Could you provide some examples of characters encoded in JIS but not
> in UCS4?  [a url would be fine, if it's hard to represent this in email.]

China-Japan-Korea Unified Ideographs
is one that is causing the most pain.

You could search for pages with the keyword "unified ideographs CJK"
in google and lots of pages will be found.

The main problem is that, the character information is not enough to 
represent what it is, and in practical terms, depending on the current chosen
locale, the font used has to be changed.

I.e. you need the "current language" information to decypher UCS4,
like the "lang" tag in xml.

Microsoft (one of the main culprit for the CJK unification process, I have
heard) seems to have noticed the problem, and 
http://support.microsoft.com/support/kb/articles/Q170/5/59.ASP
tells you the problem and lists a bunch of  "unified ideograph"s which
cannot be correctly handled.

I am not sure if this has been resolved.

regards,
	junichi

-- 
dancer@debian.org  http://www.netfort.gr.jp/~dancer
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GE d+ s:- a-- C+ UL++++ P- L+++ E W++ N o-- K- w++ 
O- M- V-- PS+ PE-- Y+ PGP+ t-- 5 X-- R* tv- b+ DI- D++ 
G e h* r% !y+ 
------END GEEK CODE BLOCK------

Reply to:

Follow-Ups:
- Bug#99324: Default charset should be UTF-8
  - From: Raul Miller <moth@debian.org>

References:
- Re: Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
- Re: Bug#99324: Default charset should be UTF-8
  - From: Josip Rodin <joy@cibalia.gkvk.hr>
- Re: Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
- Re: Bug#99324: Default charset should be UTF-8
  - From: Josip Rodin <joy@cibalia.gkvk.hr>
- Re: Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
- Re: Bug#99324: Default charset should be UTF-8
  - From: Junichi Uekawa <dancer@netfort.gr.jp>
- Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
- Bug#99324: Default charset should be UTF-8
  - From: Junichi Uekawa <dancer@netfort.gr.jp>
- Bug#99324: Default charset should be UTF-8
  - From: Raul Miller <moth@debian.org>

Prev by Date: Bug#99933: Default charset should be UTF-8
Next by Date: Processed: Save as Guideline
Previous by thread: Bug#99324: Default charset should be UTF-8
Next by thread: Bug#99324: Default charset should be UTF-8
Index(es):
- Date
- Thread