[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Asian Problems with Unicode

Robert Coie:
> Aside from the concerns which have been brought up so far, another
> potential reason for lack of adoption of Unicode is the inefficiency
> of UTF-8 as a storage format (at least for Japanese text).  One of the
> design goals of UTF-8 was upwards compatibility with 7-bit ASCII.
> Another was context-free parsing (i.e. a byte's meaning can be
> determined without reference to the bytes surrounding it).  While both
> of these goals have merit, an unfortunate side-effect is that
> characters that take up 2 bytes in various Japanese character sets
> take up 3 bytes in UTF-8.

> This can be worked around by saving in UCS-2 instead, but then ASCII
> users complain, as characters that previously took 1 byte to store now 
> take 2.

I think this inefficiency is a reasonable and acceptable 
for using a universal stateless codeset.  If you can't
accept such an inefficiency, you can use ISO 2022, the
another universal stateful codeset.

David Starner:
> First place, are these standards mutually exclusive? Is it a problem in
> practice to work with both?

At first, we don't have a conversion software yet.  But such a software
would come soon though I don't knoew.

Next, I can use multiple codesets.  I am already using multiple codesets
because Windows/Macintosh (SHIFT-JIS) and Unix (EUC-Japan) uses different 
codesets and the network needs another 7bit codesets (ISO-2022-JP).  
These codesets are incompatible but they can be converted one another
by use of a simple equation.  Thus I can use.  
However, if important softwares such as the kernel, libraries, 
gettext, terminal emulators, and so on decided to use Unicode 
as a only codeset, I would have to give away all softwares which 
depend on these important softwares.

Yes, it is welcome that a software which already support various codeset 
adds Unicode to its list.

Tomohiro KUBOTA <kubota@debian.or.jp>

Reply to: