[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Asian Problems with Unicode



Aside from the concerns which have been brought up so far, another
potential reason for lack of adoption of Unicode is the inefficiency
of UTF-8 as a storage format (at least for Japanese text).  One of the
design goals of UTF-8 was upwards compatibility with 7-bit ASCII.
Another was context-free parsing (i.e. a byte's meaning can be
determined without reference to the bytes surrounding it).  While both
of these goals have merit, an unfortunate side-effect is that
characters that take up 2 bytes in various Japanese character sets
take up 3 bytes in UTF-8.

This can be worked around by saving in UCS-2 instead, but then ASCII
users complain, as characters that previously took 1 byte to store now 
take 2.

-- 
Robert Coie
Implementor, Apropos Ltd.


Reply to: