[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#603914: Please drop non-UTF8 locales



On Sun, Nov 28, 2010 at 05:21:33PM +0000, Thorsten Glaser wrote:
> Fun to be reading this. Me like ;-)
> 
> Anyway. With my Debian hat on, the C/POSIX locales must not use
> UTF-8 as encoding, because otherwise, all kind of hell breaks
> loose (consider running 'tr u x' on a binary or other legacy
> encoded text file, and tr is just an example).

From my reading of the standards a UTF-8 C locale would be required
to behave identically to the existing ASCII C locale:

• will consider all byte sequences valid
• will use only the ASCII collation sequences (LC_COLLATE would be
  identical)
• LC_CTYPE would probably also be identical (SUS specifies this
  less strictly than LC_COLLATE), but for backward compatibility
  should probably remain the same.

About the only difference would be the lack of a need for the
transliteration table, and the fact that the nl_langinfo(CODESET)
would return UTF-8.  That's pretty much it.

I'd like to persue this in the long term, but I doubt I'll have the
time to commit to it for several months.  If anyone else wishes to
tackle it, feel free to go for it!

> There are plans
> for C.UTF-8 though, and I’m a bit ashamed at having slacked off
> there…

No worries, there's not much going to happen at this stage in the
squeeze freeze.  Hopefully easy to get added early in the wheezy
cycle though!

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776 (the very end)
and #609306 (same bug but a feature request for eglibc).


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment: signature.asc
Description: Digital signature


Reply to: