Bug#99933: Default charset should be UTF-8

To: Raul Miller <moth@debian.org>, 99933@bugs.debian.org
Subject: Bug#99933: Default charset should be UTF-8
From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
Date: Tue, 12 Jun 2001 08:38:21 +0200
Message-id: <[🔎] 20010612083821.A28524@melkor.dnp.fmph.uniba.sk>
Reply-to: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>, 99933@bugs.debian.org
In-reply-to: <[🔎] 992280713.ab64baa4@debian.org>; from moth@debian.org on Mon, Jun 11, 2001 at 01:48:24PM -0400
References: <[🔎] 992280713.ab64baa4@debian.org> <[🔎] 20010611175942.B2835@deadbeast.net> <[🔎] 992305965.a7b4b423@debian.org> <[🔎] 20010611104113.A15114@melkor.dnp.fmph.uniba.sk> <[🔎] 20010611090721.A12776@usatoday.com> <[🔎] 20010611164718.A25953@melkor.dnp.fmph.uniba.sk> <[🔎] 992273294.21bafc15@debian.org> <[🔎] 20010611192021.A5433@melkor.dnp.fmph.uniba.sk> <[🔎] 992280713.ab64baa4@debian.org>

On Mon, Jun 11, 2001 at 01:48:24PM -0400, Raul Miller wrote:
> > > I disagree.  The Han Unification issue is more like the difference
> > > between the latin and the italic character sets.  Yes, many characters
> 
> On Mon, Jun 11, 2001 at 07:20:21PM +0200, Radovan Garabik wrote:
> > No, because latin (upright) and italics are used interchangebly,
> > whereas fraktur carries implicit connotation of language used -
> > just like different glyphs for unified CJK charset.
> 
> I'm sorry.  Not italics, but Old Italic.  U10300-U1032F. 

Ok, we were talking about two different things

> 
> This includes letters like U10308 OLD ITALIC LETTER THE (a circle
> with an X in it) as well as letters like U10301 OLD ITALIC LETTER BE
> (essentially the same as a capital roman B).
> 
> Here, we could assume a common history, and define a map which relates
> many of the characters.. much as has been done with Han Unification.
> 
> > > are similar, however there are also some characters which are unique
> > > to each representaiton.
> > >
> > > Also, Unicode does include Fraktur characters.
> >
> > but in mathematical symbols - that is a completely different beast
> 
> Please explain why it matters to the reader whether the letter A is
> classifed by the unicode consortium as mathematical [or not]?
> 

because mathematical letter is a different than "normal" letter.
They might look alike, but (depending on typography), often
do not.

> > > > I am really not sure if unicode went the right way, I feel the
> > > > ability to display Chinese name in a Japanese document using
> > > > Chinese glyphs (or vice versa) is something that should not be get
> > > > rid of...
> > >
> > > And, this could be rectified -- with Unicode 3.1, they have the code
> > > space to represent each major representation of the character set.
> >
> > if only they instead of talking how bad is unicode started working on
> > improving it (duck, run :-))
> 
> I don't have the technical skill nor the political connections to properly
> contribute to the unicode consortium.  I can, however, point out major

well, this was not aimed at you :-)

> problem areas, and I like to think of that as valuable [at least to
> Debian -- I like to think that the members of the Unicode Consortium
> are already aware of these problems].
> 
> > > > perhaps it should consider them to be different scripts with
> > > > different encodings, but when would it stop? Making italics,
> > > > boldface etc. to be different characters?
> > >
> > > Unicode already does that. Take a look at the mathematical
> > > alphanumeric symbols [1D400-1D744]. For example: 1D400 MATHEMATICAL
> > > BOLD CAPITAL A
> >
> > the reason and purpose of these characters is quite different from
> > "base" unicode characters
> 
> The point is that unicode already does support the things you were
> suggesting as more unreasonable than indicating oriental language.
> 

It does not. Bold mathematic symbols are quite different
from bold text characters. MATHEMATICAL BOLD CAPITAL A
has a very different meaning than ITALIC MATHEMATICAL CAPITAL A
(e.g. one denotes a variable, other vector or matrix)

You can make a text bold, and meaning will remain.
If you make a mathematical expression all bold, it will
have a completely different meaning.

And, there is no such letter as
MATHEMATICAL BOLD CYRILLIC CAPITAL LETTER A, since
cyrillic letters are normally not used in mathematic context.
Yet, in your favourite typesetting software, you are able
to write boldface cyrillic (since it is again typesetting
issue, not encoding one)

Well, personally, I could survive without these mathematical chars in
unicode, but neither do I have any objections for using them.

On Mon, Jun 11, 2001 at 08:39:25PM -0400, Raul Miller wrote:
> > > > > Also, Unicode does include Fraktur characters.
> > > >
> > > > but in mathematical symbols - that is a completely different beast
> 
> On Mon, Jun 11, 2001 at 01:48:24PM -0400, Raul Miller wrote:
> > > Please explain why it matters to the reader whether the letter A is
> > > classifed by the unicode consortium as mathematical [or not]?
> 
> On Mon, Jun 11, 2001 at 05:59:43PM -0500, Branden Robinson wrote:
> > Because in a mathematical equation, a "script" A, for instance, is
> > semantically distinct from a latin capital A. Fundamental, basic
> > information is lost without a distinction between these characters.
> >
> > In text, italics or scripted letters for emphasis or whatever are
> > stylistic markup, not semantic distinctions. For instance, people who
> > chat with me on IRC can deduce my meaning whether or not I elect to
> > use bold and/or inverse text, and in fact that's why people get yelled
> > at when they do it.
> 
> You're telling me why the context matters.  You're not telling me why
> the unicode naming of the code points matters.
> 
> If the reader sees "Branden", why should it matter whether any underlying
> code points were designated by the consortium as mathematical?  If the

because if code points are mathematical, I parse it as
B \times r  \times a \times n \times d \times e \times n


-- 
 -----------------------------------------------------------
| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!

Reply to:

Follow-Ups:
- Bug#99933: Default charset should be UTF-8
  - From: Raul Miller <moth@debian.org>
- Bug#99933: Default charset should be UTF-8
  - From: Julian Gilbey <J.D.Gilbey@qmw.ac.uk>

References:
- Bug#99933: Bug#99324: Default charset should be UTF-8
  - From: Raul Miller <moth@debian.org>
- Bug#99933: Bug#99324: Default charset should be UTF-8
  - From: Branden Robinson <branden@debian.org>
- Bug#99933: Bug#99324: Default charset should be UTF-8
  - From: Raul Miller <moth@debian.org>
- Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
- Bug#99324: Default charset should be UTF-8
  - From: Raul Miller <moth@debian.org>
- Bug#99933: Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
- Bug#99933: Bug#99324: Default charset should be UTF-8
  - From: Raul Miller <moth@debian.org>
- Bug#99933: Bug#99324: Default charset should be UTF-8
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>

Prev by Date: Bug#99933: Bug#99324: Default charset should be UTF-8
Next by Date: Bug#99933: Default charset should be UTF-8
Previous by thread: Bug#99933: Bug#99324: Default charset should be UTF-8
Next by thread: Bug#99933: Default charset should be UTF-8
Index(es):
- Date
- Thread