Re: Asian Problems with Unicode

To: sen_ml@eccosys.com
Cc: debian-devel@lists.debian.org
Subject: Re: Asian Problems with Unicode
From: Robert Coie <rac@mata.intrigue.com>
Date: Fri, 10 Sep 1999 17:09:12 -0700
Message-id: <[🔎] 199909110009.RAA05787@mata.intrigue.com>
Reply-to: rac@mata.intrigue.com
References: <[🔎] 19990910002219E.1000@eccosys.com>

Aside from the concerns which have been brought up so far, another
potential reason for lack of adoption of Unicode is the inefficiency
of UTF-8 as a storage format (at least for Japanese text).  One of the
design goals of UTF-8 was upwards compatibility with 7-bit ASCII.
Another was context-free parsing (i.e. a byte's meaning can be
determined without reference to the bytes surrounding it).  While both
of these goals have merit, an unfortunate side-effect is that
characters that take up 2 bytes in various Japanese character sets
take up 3 bytes in UTF-8.

This can be worked around by saving in UCS-2 instead, but then ASCII
users complain, as characters that previously took 1 byte to store now 
take 2.

-- 
Robert Coie
Implementor, Apropos Ltd.

Reply to:

Follow-Ups:
- Re: Asian Problems with Unicode
  - From: David Starner <dvdeug@x8b4e53cd.dhcp.okstate.edu>

References:
- Re: Multibyte encoding - what should a package provide?
  - From: sen_ml@eccosys.com

Prev by Date: weekly policy summary
Next by Date: Re: Debian BTS
Previous by thread: Re: Multibyte encoding - what should a package provide?
Next by thread: Re: Asian Problems with Unicode
Index(es):
- Date
- Thread