Bug#99933: Bug#99324: Default charset should be UTF-8

To: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
Cc: 99933@bugs.debian.org
Subject: Bug#99933: Bug#99324: Default charset should be UTF-8
From: Raul Miller <moth@debian.org>
Date: Mon, 11 Jun 2001 13:48:24 -0400
Message-id: <[🔎] 992280713.ab64baa4@debian.org>
Reply-to: Raul Miller <moth@debian.org>, 99933@bugs.debian.org
In-reply-to: <[🔎] 20010611192021.A5433@melkor.dnp.fmph.uniba.sk>; from garabik@melkor.dnp.fmph.uniba.sk on Mon, Jun 11, 2001 at 07:20:21PM +0200
References: <[🔎] 20010611104113.A15114@melkor.dnp.fmph.uniba.sk> <[🔎] 20010611090721.A12776@usatoday.com> <[🔎] 20010611164718.A25953@melkor.dnp.fmph.uniba.sk> <[🔎] 992273294.21bafc15@debian.org> <[🔎] 20010611192021.A5433@melkor.dnp.fmph.uniba.sk>

> > I disagree.  The Han Unification issue is more like the difference
> > between the latin and the italic character sets.  Yes, many characters

On Mon, Jun 11, 2001 at 07:20:21PM +0200, Radovan Garabik wrote:
> No, because latin (upright) and italics are used interchangebly,
> whereas fraktur carries implicit connotation of language used -
> just like different glyphs for unified CJK charset.

I'm sorry.  Not italics, but Old Italic.  U10300-U1032F. 

This includes letters like U10308 OLD ITALIC LETTER THE (a circle
with an X in it) as well as letters like U10301 OLD ITALIC LETTER BE
(essentially the same as a capital roman B).

Here, we could assume a common history, and define a map which relates
many of the characters.. much as has been done with Han Unification.

> > are similar, however there are also some characters which are unique
> > to each representaiton.
> >
> > Also, Unicode does include Fraktur characters.
>
> but in mathematical symbols - that is a completely different beast

Please explain why it matters to the reader whether the letter A is
classifed by the unicode consortium as mathematical [or not]?

> > > I am really not sure if unicode went the right way, I feel the
> > > ability to display Chinese name in a Japanese document using
> > > Chinese glyphs (or vice versa) is something that should not be get
> > > rid of...
> >
> > And, this could be rectified -- with Unicode 3.1, they have the code
> > space to represent each major representation of the character set.
>
> if only they instead of talking how bad is unicode started working on
> improving it (duck, run :-))

I don't have the technical skill nor the political connections to properly
contribute to the unicode consortium.  I can, however, point out major
problem areas, and I like to think of that as valuable [at least to
Debian -- I like to think that the members of the Unicode Consortium
are already aware of these problems].

> > > perhaps it should consider them to be different scripts with
> > > different encodings, but when would it stop? Making italics,
> > > boldface etc. to be different characters?
> >
> > Unicode already does that. Take a look at the mathematical
> > alphanumeric symbols [1D400-1D744]. For example: 1D400 MATHEMATICAL
> > BOLD CAPITAL A
>
> the reason and purpose of these characters is quite different from
> "base" unicode characters

The point is that unicode already does support the things you were
suggesting as more unreasonable than indicating oriental language.

> > > As for X11, fonts are being rapidly developped.
> >
> > For currently relevant policy it matters what actually works.
>
> of course. That's why my proposal is very mildly worded and gives a
> lot of freedom to maintainers to decide what charset they want.

Agreed.

> > > >  "Package may (at the discretion of the maintainer) include
> > > >   documentation files in other encodings, if they are present also in
> > > >   canonical encoding, and if the encodings used are clearly marked. 
> > > >   If a particular font is required, that should be clearly marked."
> > > 
> > > You do not know what is a particular font... one of 
> > > (traditional|simplified)C,J,K, or the full font name?
> > 
> > I'm not sure I understand this question (I don't know enough about
> > oriental languages and fonts to give a full answer in any event).
> 
> well, would you indicate just "this README needs japanese unicode font"
> and the user has to figure out by himself what is that
> or "this README needs -misc-fixed-*-*-*-ja-*-*-*-*-*-*-iso10646-1"
> and the user is fubar when he does not have that font.

I think "needs japanese unicode font" might suffice.  Perhaps a package
name which includes that font would also be good.  An X font spec would,
of course, be necessary if you wanted a program to "just work".

It depends on context.

> > > More appropriate example from the history is the war between
> > > EBDIC, ASCII and other proprietary encodings... thanks god one and
> > > only one encoding won.
> >
> > ebdic vs. ascii wasn't about supported languages.
>
> true, but the mess in encodings was quite comparable to what is there
> today outside of Latin-1 world. And the peace ASCII brought could be
> compared to peace that (hopefully :-)) unicode brings one day.

I'll accept your analogy.  (In the name of peace :).

Thanks,

-- 
Raul

Reply to:

Follow-Ups:
- Bug#99933: Bug#99324: Default charset should be UTF-8
  - From: Branden Robinson <branden@debian.org>

References:
- Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
- Bug#99324: Default charset should be UTF-8
  - From: Raul Miller <moth@debian.org>
- Bug#99933: Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
- Bug#99933: Bug#99324: Default charset should be UTF-8
  - From: Raul Miller <moth@debian.org>
- Bug#99933: Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>

Prev by Date: Bug#99933: Bug#99324: Default charset should be UTF-8
Next by Date: Bug#99933: Bug#99324: Default charset should be UTF-8
Previous by thread: Bug#99933: Bug#99324: Default charset should be UTF-8
Next by thread: Bug#99933: Bug#99324: Default charset should be UTF-8
Index(es):
- Date
- Thread