[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#292330: use UTF-8 by default



On Mon, Jun 18, 2007 at 06:46:40PM +0300, Lars Wirzenius wrote:
> On ma, 2007-06-18 at 13:37 +0100, Pierre Habouzit wrote:
> > On Mon, Jun 18, 2007 at 10:48:04AM +0100, Pierre Habouzit wrote:
> > > multi-byte one would be really really bad (as you would end up with e..g.
> > > strings split in the middle of a point code, *brrr* you definitely don't
> > > want that).
> > 
> >   I wasn't clear it seems, but what I mean is if a programs assumes he's
> > dealing with ascii,
> 
> This buggy assumption seems to happen in every locale, not just C.UTF-8,
> and in every other case we treat it as a bug. Is there a standard that
> says every C.* locale must have the same single byte character set as
> the plain C locale? 
> 
> (Incidentally, the standard for the C language does not require the
> character set in the C locale to be ASCII; EBCDIC, for example, works as
> well. See 5.2.1, "Character sets", for the full description. You might
> be able to find a copy of the standard by searching for ISO/IEC 9899.)
> 

  I stand corrected, I read POSIX base, chapter 7: indeed, it specifies
how the collation is done, but not how the characters are encoded, so a
C.utf-8 does not seems like a so bad idea. Though I'm quite sure we can
find software that assume that the character set in the C locale are
always ASCII.

  But okay, maybe it's worth fixing those few.
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

Attachment: pgpQUQmZ114DC.pgp
Description: PGP signature


Reply to: