Bug#99933: Comments on Unicode

To: David Starner <dstarner98@aasaa.ofe.org>, Antti-Juhani Kaijanaho <gaia@iki.fi>
Cc: 99933@bugs.debian.org
Subject: Bug#99933: Comments on Unicode
From: Raul Miller <moth@debian.org>
Date: Fri, 6 Jul 2001 08:37:43 -0400
Message-id: <994422263.69160713@debian.org>
Reply-to: Raul Miller <moth@debian.org>, 99933@bugs.debian.org
In-reply-to: <02a801c105cc$d6343ee0$ae4efea9@dvdeug>; from dstarner98@aasaa.ofe.org on Fri, Jul 06, 2001 at 04:36:25AM +0100
References: <010201c10517$164398c0$ae4efea9@dvdeug> <20010705133736.C12776@usatoday.com> <20010706112341.Q2483@kukkaruukku.keltti.jyu.fi> <010201c10517$164398c0$ae4efea9@dvdeug> <20010705133736.C12776@usatoday.com> <02a801c105cc$d6343ee0$ae4efea9@dvdeug>

On Fri, Jul 06, 2001 at 04:36:25AM +0100, David Starner wrote:
> > Do you have any idea whether the problems identified at
> > http://support.microsoft.com/support/kb/articles/Q170/5/59.ASP
> > have been resolved?
> 
> Are they a problem for us? Windows Code Page 932 may or may not correspond
> to anything that we care about. (At a glance, at least one of each pair that
> both correspond to the same Unicode character is not in the real JIS X
> 0218.)

If it's indeed the case that this is a CP 932 problem and not a shift JIS
problem, and if it's indeed the case that we don't support CP 932, then
I'll agree that this isn't a problem.

> > Prior to Unicode 3.1 the code space was 16 bits.
>
> NO. Since Unicode 2.0, the code space has been 21 bits. The ONLY thing
> that Unicode 3.1 did, is put characters above U+FFFF. It did not
> change the fundamental structure of Unicode in the least.

I stand corrected.

> > Once unicode can act as a super set for every character set we currently
> > support, we can use it as such.  Until then, we can't.
> 
> If Unicode were a super set for every character set that anyone needs to
> support, it would be worthless and completely unusable.

I didn't say for any character set that anyone needs to support.
I said for every character set we currently support.  I hope you see the
difference.  [And, as an aside, I should have said "for each character
set that we currently support" -- I understand that unicode doesn't need
to support mixed character set usage before we migrate.]

> However, if we currently support any character set well, it is through
> a Unicode based glibc - I don't believe libc accepts the existance of
> any character set that can't be mapped to Unicode. So arguably, yes,
> Unicode is a super set for every character set we currently support
> well.

Assuming we're using glibc support (e.g. toupper()) for all those
character sets, I'll agree that you have a good point.

On 20010705T133736-0400, Raul Miller wrote:
> > in HTML the language can only be identified in the mime header.

On Fri, Jul 06, 2001 at 11:23:42AM +0300, Antti-Juhani Kaijanaho wrote:
> There is no such thing as a MIME header in HTML.
>
> Besides, HTML does include the lang attribute for most elements. I
> wonder what it's for if not for indicating the language.

I stand corrected.

Thanks,

-- 
Raul

Reply to:

Follow-Ups:
- Bug#99933: Comments on Unicode
  - From: "David Starner" <dstarner98@aasaa.ofe.org>

References:
- Bug#99933: Comments on Unicode
  - From: "David Starner" <dstarner98@aasaa.ofe.org>
- Bug#99933: Comments on Unicode
  - From: Raul Miller <moth@debian.org>
- Bug#99933: Comments on Unicode
  - From: Antti-Juhani Kaijanaho <gaia@iki.fi>
- Bug#99933: Comments on Unicode
  - From: "David Starner" <dstarner98@aasaa.ofe.org>

Prev by Date: Processed: Bug#99933: Make amendment
Next by Date: Distribution comparison project
Previous by thread: Bug#99933: Comments on Unicode
Next by thread: Bug#99933: Comments on Unicode
Index(es):
- Date
- Thread