[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#99324: Default charset should be UTF-8

At Wed, 6 Jun 2001 17:02:16 +0200,
Radovan Garabik wrote:

> > > > utf8 in the current state does not cover everything we had in other encodings.
> > > 
> > > utf8 is just a _multibyte_ encoding, not _character_ encoding,
> > > it can represent whatever character encoding is used in UCS-4
> > 
> > UCS4 is not a satisfactory encoding for our needs, unfortunately.
> > JIS is not comlpete either, but UCS4 is less.
> but: JIS is japanese only, UCS-4 is global
> UCS-4 can (and will) be easily expanded, there are no technical 
> problems in adding characters to this encoding
> can JIS be easily extended to support missing characters?
> I do not think so...

First of all, JIS means Japanese Industry Standards, it's not only for
character sets/encoding.  JIS means many standards for industrial worlds,
such as screw size or so.

Anyway, in this context, I assume JIS you say is JIS X0208. This is just
character sets not encoding.  We usually uses JIS X0208 with ASCII in 
ISO 2022 encoding.  When ASCII->G0 and JIS X0208->G1 and G0->GL G1->GR,
we call it as EUC-JP (presicely, supplementary character sets is used
for G2/G3).  In Japanese linux environments, we usually uses EUC-JP, because
it's most simplest encoding for Japanese for now.
When initially ASCII->G0 and G0->GL, and switch ASCII to JIS X0208 
with ESC $ B and switch back with ESC ( B, we call it as JIS 7bit encodig
or commonly ISO-2022-JP.  We use this encoding for Internet message for
Japanese, because it uses only 7bit, so it can be safely passed via 
non-8bit-clean routes.  This is only simple version of ISO 2022, so
it can be easily expanded to use other character sets. 
X Compond Text is the example for more use of ISO 2022.

> UCS-4 can, given some effort.

Given some effort, ISO-2022 can too.

Fumitoshi UKAI

Reply to: