Bug#603914: Please drop non-UTF8 locales

To: Thorsten Glaser <tg@mirbsd.de>, 603914@bugs.debian.org
Subject: Bug#603914: Please drop non-UTF8 locales
From: Roger Leigh <rleigh@codelibre.net>
Date: Sat, 8 Jan 2011 12:32:54 +0000
Message-id: <[🔎] 20110108123254.GB25780@codelibre.net>
Reply-to: Roger Leigh <rleigh@codelibre.net>, 603914@bugs.debian.org
In-reply-to: <Pine.BSM.4.64L.1011281721290.27885@herc.mirbsd.org>
References: <Pine.BSM.4.64L.1011281721290.27885@herc.mirbsd.org>

On Sun, Nov 28, 2010 at 05:21:33PM +0000, Thorsten Glaser wrote:
> Fun to be reading this. Me like ;-)
> 
> Anyway. With my Debian hat on, the C/POSIX locales must not use
> UTF-8 as encoding, because otherwise, all kind of hell breaks
> loose (consider running 'tr u x' on a binary or other legacy
> encoded text file, and tr is just an example).

From my reading of the standards a UTF-8 C locale would be required
to behave identically to the existing ASCII C locale:

• will consider all byte sequences valid
• will use only the ASCII collation sequences (LC_COLLATE would be
  identical)
• LC_CTYPE would probably also be identical (SUS specifies this
  less strictly than LC_COLLATE), but for backward compatibility
  should probably remain the same.

About the only difference would be the lack of a need for the
transliteration table, and the fact that the nl_langinfo(CODESET)
would return UTF-8.  That's pretty much it.

I'd like to persue this in the long term, but I doubt I'll have the
time to commit to it for several months.  If anyone else wishes to
tackle it, feel free to go for it!

> There are plans
> for C.UTF-8 though, and I’m a bit ashamed at having slacked off
> there…

No worries, there's not much going to happen at this stage in the
squeeze freeze.  Hopefully easy to get added early in the wheezy
cycle though!

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776 (the very end)
and #609306 (same bug but a feature request for eglibc).

Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment: signature.asc
Description: Digital signature

Reply to:

Follow-Ups:
- Bug#603914: Please drop non-UTF8 locales
  - From: Thorsten Glaser <tg@mirbsd.de>

Prev by Date: Processed: Reassign
Next by Date: Re: C.UTF-8 in squeeze (was: Re: Bug#522776: debian-policy: mandate existence of a standardised locale)
Previous by thread: Bug#603914: Please drop non-UTF8 locales
Next by thread: Bug#603914: Please drop non-UTF8 locales
Index(es):
- Date
- Thread