Bug#99933: Comments on Unicode

To: David Starner <dstarner98@aasaa.ofe.org>, 99933@bugs.debian.org
Subject: Bug#99933: Comments on Unicode
From: Raul Miller <moth@debian.org>
Date: Thu, 5 Jul 2001 13:37:36 -0400
Message-id: <20010705133736.C12776@usatoday.com>
Reply-to: Raul Miller <moth@debian.org>, 99933@bugs.debian.org
In-reply-to: <010201c10517$164398c0$ae4efea9@dvdeug>; from dstarner98@aasaa.ofe.org on Thu, Jul 05, 2001 at 06:55:24AM +0100
References: <010201c10517$164398c0$ae4efea9@dvdeug>

Raul Miller:
> > Which implies that this mechanism isn't useful for representing different
> > languages in the same document.  That, instead, it's logically equivalent
> > to a MIME declaration of the document's language.

On Thu, Jul 05, 2001 at 06:55:24AM +0100, David Starner wrote:
> I don't know where you got this impression, but it's wrong. Read the
> document. It introduces a  TAG START character, Ascii-equivelent tag
> characters, and a TAG CANCEL character. <EN-US>You can label text like
> this.<DE-DE>Ja, du kanst.<TAG CANCEL>

Except that you're not supposed to use this mechanism with HTML, and
unlike XML, in HTML the language can only be identified in the mime
header.  

However, if unicode can act as a super set for every character set we
currently use then we can ignore this problem for the purpose of deciding
when to migrate.

> > I disagree.  The Han Unification issue is more like the difference
> > between the latin and the italic character sets.  Yes, many characters
> > are similar, however there are also some characters which are unique to
> > each representaiton.
> 
> Japenese can travel in China and use 'Japenese' ideographs to comunicate
> with the Chinese people who have no knowledge of Chinese. That's a indictive
> sign that the characters being used are fundamentally the same characters.
> Yes, there are characters that are written differently and unique
> characters - such is true about two languages that use the Latin script. I'm
> not arguing that all the unifications of individual characters were correct,
> but the fundamental concept of unification is correct. (It's interesting
> that it's almost always the Japenese that complain about the unificaition -
> the Koreans and Chinese, for the most part, seem to find the variations
> introduced by unification to be normal. One of the main forces behind
> unificiation was Chinese, with GB 13000)

Do you have any idea whether the problems identified at
http://support.microsoft.com/support/kb/articles/Q170/5/59.ASP
have been resolved?

I've not been able to find anybody knowledgeable about this issue.

> > And, this could be rectified -- with Unicode 3.1, they have the code
> > space to represent each major representation of the character set.
>
> Actually, it can't be rectified. The code space has existed for almost
> half But part of the fundamental nature of Unicode is the unification
> of CJK characters. You can not change the meaning of 50,000 characters
> in the Unicode standard and invalidate all Japenese/Chinese/Korean
> (pick two) data in Unicode, any more than you can introduce case up
> and case down control characters into ASCII and use the space of lower
> case characters for something else. a decade - the only change is that
> it's being used now.

I don't know what you mean.

Prior to Unicode 3.1 the code space was 16 bits.  With Unicode 3.1
the code space has been expanded to 21 bits.

In principle, at least, with the additional code space unicode can have a
1-to-1 mapping with the characters represented in the shift jis standards.

> > However, Unicode is not a mature standard, so we need to be careful
> > in places where it would cause problems.
>
> What? It's not mature? The majority of the world's desktops use, or
> will soon use, Unicode, as it's fundamental to Mac OS X and Windows
> NT/2000/ME. It's been around for ten years now, and has reached the
> point where it's fundamentally stagnant. Sure, there will be a few
> more ideographs, a few more mathematical characters, a few more
> obscure/dead/minority scripts encoded but Unicode 3.1 is basically
> what Unicode 5.9 will be. The Unicode people are committed to not
> breaking backward compatibility, and with the wealth of support put by
> many of them into Unicode, they can't afford to change anything major.
> It may be wrong, but it's mature.

Once unicode can act as a super set for every character set we currently
support, we can use it as such.  Until then, we can't.

Thanks,

-- 
Raul

Reply to:

Follow-Ups:
- Bug#99933: Comments on Unicode
  - From: "David Starner" <dstarner98@aasaa.ofe.org>
- Bug#99933: Comments on Unicode
  - From: Antti-Juhani Kaijanaho <gaia@iki.fi>

References:
- Bug#99933: Comments on Unicode
  - From: "David Starner" <dstarner98@aasaa.ofe.org>

Prev by Date: Bug#103289: marked as done (Postinst error in debian-policy)
Next by Date: Bug#99933: Comments on Unicode
Previous by thread: Bug#99933: Comments on Unicode
Next by thread: Bug#99933: Comments on Unicode
Index(es):
- Date
- Thread