Bug#292330: use UTF-8 by default

To: Lars Wirzenius <liw@liw.iki.fi>
Cc: 292330@bugs.debian.org
Subject: Bug#292330: use UTF-8 by default
From: Pierre Habouzit <madcoder@debian.org>
Date: Mon, 18 Jun 2007 17:04:36 +0100
Message-id: <[🔎] 20070618160436.GC22550@artemis.internal.dc7.debconf.org>
Reply-to: Pierre Habouzit <madcoder@debian.org>, 292330@bugs.debian.org
In-reply-to: <[🔎] 1182181600.6428.10.camel@dorfl.liw.iki.fi>
References: <[🔎] Pine.BSM.4.64L.0706152312280.5177@odem.66h.42h.de> <[🔎] 20070616094439.GA6113@lapse.madduck.net> <[🔎] Pine.BSM.4.64L.0706161222280.8577@herc.mirbsd.org> <[🔎] 20070616132626.GA16635@lapse.madduck.net> <[🔎] Pine.BSM.4.64L.0706161427380.14260@odem.66h.42h.de> <[🔎] 20070616164800.GA24328@lapse.madduck.net> <[🔎] 20070618094804.GA30062@artemis.internal.dc7.debconf.org> <[🔎] 20070618123752.GA22550@artemis.internal.dc7.debconf.org> <[🔎] 1182181600.6428.10.camel@dorfl.liw.iki.fi>

On Mon, Jun 18, 2007 at 06:46:40PM +0300, Lars Wirzenius wrote:
> On ma, 2007-06-18 at 13:37 +0100, Pierre Habouzit wrote:
> > On Mon, Jun 18, 2007 at 10:48:04AM +0100, Pierre Habouzit wrote:
> > > multi-byte one would be really really bad (as you would end up with e..g.
> > > strings split in the middle of a point code, *brrr* you definitely don't
> > > want that).
> > 
> >   I wasn't clear it seems, but what I mean is if a programs assumes he's
> > dealing with ascii,
> 
> This buggy assumption seems to happen in every locale, not just C.UTF-8,
> and in every other case we treat it as a bug. Is there a standard that
> says every C.* locale must have the same single byte character set as
> the plain C locale? 
> 
> (Incidentally, the standard for the C language does not require the
> character set in the C locale to be ASCII; EBCDIC, for example, works as
> well. See 5.2.1, "Character sets", for the full description. You might
> be able to find a copy of the standard by searching for ISO/IEC 9899.)
> 

  I stand corrected, I read POSIX base, chapter 7: indeed, it specifies
how the collation is done, but not how the characters are encoded, so a
C.utf-8 does not seems like a so bad idea. Though I'm quite sure we can
find software that assume that the character set in the C locale are
always ASCII.

  But okay, maybe it's worth fixing those few.
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

Attachment: pgpQUQmZ114DC.pgp
Description: PGP signature

Reply to:

References:
- Bug#292330: use UTF-8 by default
  - From: Thorsten Glaser <tg@mirbsd.de>
- Bug#292330: use UTF-8 by default
  - From: martin f krafft <madduck@debian.org>
- Bug#292330: use UTF-8 by default
  - From: Thorsten Glaser <tg@mirbsd.de>
- Bug#292330: use UTF-8 by default
  - From: martin f krafft <madduck@debian.org>
- Bug#292330: use UTF-8 by default
  - From: Thorsten Glaser <tg@mirbsd.de>
- Bug#292330: use UTF-8 by default
  - From: martin f krafft <madduck@debian.org>
- Bug#292330: use UTF-8 by default
  - From: Pierre Habouzit <madcoder@debian.org>
- Bug#292330: use UTF-8 by default
  - From: Pierre Habouzit <madcoder@debian.org>
- Bug#292330: use UTF-8 by default
  - From: Lars Wirzenius <liw@liw.iki.fi>

Prev by Date: Source files for buildd.d.o/~jeroen/status updated
Next by Date: Returned mail: Data format error
Previous by thread: Bug#292330: use UTF-8 by default
Next by thread: Bug#292330: use UTF-8 by default
Index(es):
- Date
- Thread