Bug#99933: Comments on Unicode

To: <99933@bugs.debian.org>
Subject: Bug#99933: Comments on Unicode
From: "David Starner" <dstarner98@aasaa.ofe.org>
Date: Thu, 5 Jul 2001 06:55:24 +0100
Message-id: <010201c10517$164398c0$ae4efea9@dvdeug>
Reply-to: "David Starner" <dstarner98@aasaa.ofe.org>, 99933@bugs.debian.org

Raul Miller:
> Which implies that this mechanism isn't useful for representing different
> languages in the same document.  That, instead, it's logically equivalent
> to a MIME declaration of the document's language.

I don't know where you got this impression, but it's wrong. Read the
document. It introduces a  TAG START character, Ascii-equivelent tag
characters, and a TAG CANCEL character. <EN-US>You can label text like
this.<DE-DE>Ja, du kanst.<TAG CANCEL>

> Please explain why it matters to the reader whether the letter A is
> classifed by the unicode consortium as mathematical [or not]?

Because in theory, MATHEMATICAL ITALIC CAPITAL A won't be available on every
keyboard, nor in every font. Any software that translates ordinary,
non-mathematical italic characters to MATHEMATICAL ITALIC's would be
non-conformant to the Unicode standard. They shouldn't obey case mappings,
and HTML markup and the like probably won't and shouldn't work on them.
There's no way most people will be able to enter them without setting up
fairly unusual software. As a reader, you probably couldn't tell if my
message was in KOI8-R and that I was using the Cyrllic lookalike characters
whereever possible, but that doesn't make it more correct or more likely.

> I disagree.  The Han Unification issue is more like the difference
> between the latin and the italic character sets.  Yes, many characters
> are similar, however there are also some characters which are unique to
> each representaiton.

Japenese can travel in China and use 'Japenese' ideographs to comunicate
with the Chinese people who have no knowledge of Chinese. That's a indictive
sign that the characters being used are fundamentally the same characters.
Yes, there are characters that are written differently and unique
characters - such is true about two languages that use the Latin script. I'm
not arguing that all the unifications of individual characters were correct,
but the fundamental concept of unification is correct. (It's interesting
that it's almost always the Japenese that complain about the unificaition -
the Koreans and Chinese, for the most part, seem to find the variations
introduced by unification to be normal. One of the main forces behind
unificiation was Chinese, with GB 13000)

> And, this could be rectified -- with Unicode 3.1, they have the code
> space to represent each major representation of the character set.

Actually, it can't be rectified. The code space has existed for almost half
a decade - the only change is that it's being used now. But part of the
fundamental nature of Unicode is the unification of CJK characters. You can
not change the meaning of 50,000 characters in the Unicode standard and
invalidate all Japenese/Chinese/Korean (pick two) data in Unicode, any more
than you can introduce case up and case down control characters into ASCII
and use the space of lower case characters for something else.

> However, Unicode is not a mature standard, so we need to be careful in
> places where it would cause problems.

What? It's not mature? The majority of the world's desktops use, or will
soon use, Unicode, as it's fundamental to Mac OS X and Windows NT/2000/ME.
It's been around for ten years now, and has reached the point where it's
fundamentally stagnant. Sure, there will be a few more ideographs, a few
more mathematical characters, a few more obscure/dead/minority scripts
encoded but Unicode 3.1 is basically what Unicode 5.9 will be. The Unicode
people are committed to not breaking backward compatibility, and with the
wealth of support put by many of them into Unicode, they can't afford to
change anything major. It may be wrong, but it's mature.


 > But that still leaves us with the "JIS has characters which aren't in
> Unicode" issue.  [If that's an actual issue.]

All the characters from JIS X 0208 and JIS X 0212 are in Unicode (they were
one of the original primary sources of characters for Unicode). JIS X 0208
is the character set used in ISO-2022-JP, and I believe SJIS and EUC-JP use
the same set. JIS X 0213 should be completely included in Unicode, as the
same Japanese body that does JIS X 0213 is the ISO 10646 liason. I know that
a number of what Unicode would consider variants of preencoded characters
were encoded in Unicode for compatibility with JIS X 0213.

Radovan Garabik:

> well, would you indicate just "this README needs japanese unicode font"
> and the user has to figure out by himself what is that
> or "this README needs -misc-fixed-*-*-*-ja-*-*-*-*-*-*-iso10646-1"
> and the user is fubar when he does not have that font.

When would this be necessary? The appropriate fixed font should get picked
by locale (it's in xterm now; I don't know if the Debian unstable xterm has
it, or if it will be in XFree 4.1 or 4.2). So the issue is only when a user
is using an inappropriate choice of font (which we can't save a user from)
or is reading a Chinese readme in a Japanese locale or vice versa. If this
is unreadable, the knowledgable user would know to switch fonts. At worst,
it's no worse than what we have now with having to change locales and fonts
to read a Chinese readme in a Japenese locales.

--
David Starner - dstarner98@aasaa.ofe.org, dvdeug@debian.org

Reply to:

Follow-Ups:
- Bug#99933: Comments on Unicode
  - From: Raul Miller <moth@debian.org>

Prev by Date: Bug#103459: versions in Build-Depends-Indep
Next by Date: Bug#103289: marked as done (Postinst error in debian-policy)
Previous by thread: Bug#103459: versions in Build-Depends-Indep
Next by thread: Bug#99933: Comments on Unicode
Index(es):
- Date
- Thread