Re: Asian Problems with Unicode
On Fri, Sep 10, 1999 at 05:09:12PM -0700, Robert Coie wrote:
> Aside from the concerns which have been brought up so far, another
> potential reason for lack of adoption of Unicode is the inefficiency
> of UTF-8 as a storage format (at least for Japanese text). One of the
> design goals of UTF-8 was upwards compatibility with 7-bit ASCII.
> Another was context-free parsing (i.e. a byte's meaning can be
> determined without reference to the bytes surrounding it). While both
> of these goals have merit, an unfortunate side-effect is that
> characters that take up 2 bytes in various Japanese character sets
> take up 3 bytes in UTF-8.
> This can be worked around by saving in UCS-2 instead, but then ASCII
> users complain, as characters that previously took 1 byte to store now
> take 2.
First place, are these standards mutually exclusive? Is it a problem in
practice to work with both?
Second, this isn't a big deal. I don't believe most people have huge
amounts of uncompressed text laying about, at least not enough to
make a doubling of the space make a real difference. As for compressed
text, almost any compressor should get the text down to about the
same space usage. (Feel free to prove me wrong here with real numbers.)
David Starner - email@example.com