Bug#99324: Default charset should be UTF-8
At Wed, 6 Jun 2001 17:02:16 +0200,
Radovan Garabik wrote:
> > > > utf8 in the current state does not cover everything we had in other encodings.
> > >
> > > utf8 is just a _multibyte_ encoding, not _character_ encoding,
> > > it can represent whatever character encoding is used in UCS-4
> > UCS4 is not a satisfactory encoding for our needs, unfortunately.
> > JIS is not comlpete either, but UCS4 is less.
> but: JIS is japanese only, UCS-4 is global
> UCS-4 can (and will) be easily expanded, there are no technical
> problems in adding characters to this encoding
> can JIS be easily extended to support missing characters?
> I do not think so...
First of all, JIS means Japanese Industry Standards, it's not only for
character sets/encoding. JIS means many standards for industrial worlds,
such as screw size or so.
Anyway, in this context, I assume JIS you say is JIS X0208. This is just
character sets not encoding. We usually uses JIS X0208 with ASCII in
ISO 2022 encoding. When ASCII->G0 and JIS X0208->G1 and G0->GL G1->GR,
we call it as EUC-JP (presicely, supplementary character sets is used
for G2/G3). In Japanese linux environments, we usually uses EUC-JP, because
it's most simplest encoding for Japanese for now.
When initially ASCII->G0 and G0->GL, and switch ASCII to JIS X0208
with ESC $ B and switch back with ESC ( B, we call it as JIS 7bit encodig
or commonly ISO-2022-JP. We use this encoding for Internet message for
Japanese, because it uses only 7bit, so it can be safely passed via
non-8bit-clean routes. This is only simple version of ISO 2022, so
it can be easily expanded to use other character sets.
X Compond Text is the example for more use of ISO 2022.
> UCS-4 can, given some effort.
Given some effort, ISO-2022 can too.