Re: Asian Problems with Unicode
Robert Coie:
> Aside from the concerns which have been brought up so far, another
> potential reason for lack of adoption of Unicode is the inefficiency
> of UTF-8 as a storage format (at least for Japanese text). One of the
> design goals of UTF-8 was upwards compatibility with 7-bit ASCII.
> Another was context-free parsing (i.e. a byte's meaning can be
> determined without reference to the bytes surrounding it). While both
> of these goals have merit, an unfortunate side-effect is that
> characters that take up 2 bytes in various Japanese character sets
> take up 3 bytes in UTF-8.
> This can be worked around by saving in UCS-2 instead, but then ASCII
> users complain, as characters that previously took 1 byte to store now
> take 2.
I think this inefficiency is a reasonable and acceptable
for using a universal stateless codeset. If you can't
accept such an inefficiency, you can use ISO 2022, the
another universal stateful codeset.
David Starner:
> First place, are these standards mutually exclusive? Is it a problem in
> practice to work with both?
At first, we don't have a conversion software yet. But such a software
would come soon though I don't knoew.
Next, I can use multiple codesets. I am already using multiple codesets
because Windows/Macintosh (SHIFT-JIS) and Unix (EUC-Japan) uses different
codesets and the network needs another 7bit codesets (ISO-2022-JP).
These codesets are incompatible but they can be converted one another
by use of a simple equation. Thus I can use.
However, if important softwares such as the kernel, libraries,
gettext, terminal emulators, and so on decided to use Unicode
as a only codeset, I would have to give away all softwares which
depend on these important softwares.
Yes, it is welcome that a software which already support various codeset
adds Unicode to its list.
---
Tomohiro KUBOTA <kubota@debian.or.jp>
Reply to: