[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#99324: Default charset should be UTF-8



Raul Miller <moth@debian.org> immo vero scripsit

> On Wed, Jun 06, 2001 at 08:42:28PM +0900, Junichi Uekawa wrote:
> > UCS4 is not a satisfactory encoding for our needs, unfortunately.
> > JIS is not comlpete either, but UCS4 is less.
> 
> Could you provide some examples of characters encoded in JIS but not
> in UCS4?  [a url would be fine, if it's hard to represent this in email.]

China-Japan-Korea Unified Ideographs
is one that is causing the most pain.

You could search for pages with the keyword "unified ideographs CJK"
in google and lots of pages will be found.

The main problem is that, the character information is not enough to 
represent what it is, and in practical terms, depending on the current chosen
locale, the font used has to be changed.


I.e. you need the "current language" information to decypher UCS4,
like the "lang" tag in xml.


Microsoft (one of the main culprit for the CJK unification process, I have
heard) seems to have noticed the problem, and 
http://support.microsoft.com/support/kb/articles/Q170/5/59.ASP
tells you the problem and lists a bunch of  "unified ideograph"s which
cannot be correctly handled.




I am not sure if this has been resolved.

regards,
	junichi

-- 
dancer@debian.org  http://www.netfort.gr.jp/~dancer
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GE d+ s:- a-- C+ UL++++ P- L+++ E W++ N o-- K- w++ 
O- M- V-- PS+ PE-- Y+ PGP+ t-- 5 X-- R* tv- b+ DI- D++ 
G e h* r% !y+ 
------END GEEK CODE BLOCK------






Reply to: