Bug#99324: Default charset should be UTF-8

To: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>, 99324@bugs.debian.org
Subject: Bug#99324: Default charset should be UTF-8
From: Fumitoshi UKAI <ukai@debian.or.jp>
Date: Fri, 08 Jun 2001 21:39:24 +0900
Message-id: <[🔎] 87elsvqh6r.wl@mistral.ukai.org>
Reply-to: Fumitoshi UKAI <ukai@debian.or.jp>, 99324@bugs.debian.org
In-reply-to: <[🔎] 20010606170216.A24316@melkor.dnp.fmph.uniba.sk>
References: <20010531165843.A518@flower.cesarb> <20010531233007.D12862@kitenet.net> <[🔎] 20010601131743.B26998@melkor.dnp.fmph.uniba.sk> <[🔎] 20010601133112.B7815@cibalia.gkvk.hr> <[🔎] 20010601135837.D26998@melkor.dnp.fmph.uniba.sk> <[🔎] 20010601140928.A8167@cibalia.gkvk.hr> <[🔎] 20010601141220.A28618@melkor.dnp.fmph.uniba.sk> <[🔎] 20010604204421.149b9fca.dancer@netfort.gr.jp> <[🔎] 20010606112126.B14517@melkor.dnp.fmph.uniba.sk> <[🔎] 20010606204228.0f678b20.dancer@netfort.gr.jp> <[🔎] 20010606170216.A24316@melkor.dnp.fmph.uniba.sk>

At Wed, 6 Jun 2001 17:02:16 +0200,
Radovan Garabik wrote:

> > > > utf8 in the current state does not cover everything we had in other encodings.
> > > 
> > > utf8 is just a _multibyte_ encoding, not _character_ encoding,
> > > it can represent whatever character encoding is used in UCS-4
> > 
> > UCS4 is not a satisfactory encoding for our needs, unfortunately.
> > JIS is not comlpete either, but UCS4 is less.
> 
> but: JIS is japanese only, UCS-4 is global
> UCS-4 can (and will) be easily expanded, there are no technical 
> problems in adding characters to this encoding
> 
> can JIS be easily extended to support missing characters?
> I do not think so...

First of all, JIS means Japanese Industry Standards, it's not only for
character sets/encoding.  JIS means many standards for industrial worlds,
such as screw size or so.

Anyway, in this context, I assume JIS you say is JIS X0208. This is just
character sets not encoding.  We usually uses JIS X0208 with ASCII in 
ISO 2022 encoding.  When ASCII->G0 and JIS X0208->G1 and G0->GL G1->GR,
we call it as EUC-JP (presicely, supplementary character sets is used
for G2/G3).  In Japanese linux environments, we usually uses EUC-JP, because
it's most simplest encoding for Japanese for now.
When initially ASCII->G0 and G0->GL, and switch ASCII to JIS X0208 
with ESC $ B and switch back with ESC ( B, we call it as JIS 7bit encodig
or commonly ISO-2022-JP.  We use this encoding for Internet message for
Japanese, because it uses only 7bit, so it can be safely passed via 
non-8bit-clean routes.  This is only simple version of ISO 2022, so
it can be easily expanded to use other character sets. 
X Compond Text is the example for more use of ISO 2022.

> UCS-4 can, given some effort.

Given some effort, ISO-2022 can too.

Regards,
Fumitoshi UKAI

Reply to:

Follow-Ups:
- Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>

References:
- Re: Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
- Re: Bug#99324: Default charset should be UTF-8
  - From: Josip Rodin <joy@cibalia.gkvk.hr>
- Re: Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
- Re: Bug#99324: Default charset should be UTF-8
  - From: Josip Rodin <joy@cibalia.gkvk.hr>
- Re: Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
- Re: Bug#99324: Default charset should be UTF-8
  - From: Junichi Uekawa <dancer@netfort.gr.jp>
- Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
- Bug#99324: Default charset should be UTF-8
  - From: Junichi Uekawa <dancer@netfort.gr.jp>
- Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>

Prev by Date: Re: [russell@coker.com.au: Re: Adding device file to /dev.]
Next by Date: seconded plus remarks on HTML encoding Re: [PROPOSAL]: encourage use of utf-8 in documentation and clarify encoding issues
Previous by thread: Bug#99324: Default charset should be UTF-8
Next by thread: Bug#99324: Default charset should be UTF-8
Index(es):
- Date
- Thread