Bug#603914: Please drop non-UTF8 locales

To: Roger Leigh <rleigh@codelibre.net>
Cc: 603914@bugs.debian.org
Subject: Bug#603914: Please drop non-UTF8 locales
From: Thorsten Glaser <tg@mirbsd.de>
Date: Mon, 10 Jan 2011 01:44:16 +0000 (UTC)
Message-id: <[🔎] Pine.BSM.4.64L.1101100139410.13103@herc.mirbsd.org>
Reply-to: Thorsten Glaser <tg@mirbsd.de>, 603914@bugs.debian.org
In-reply-to: <[🔎] 20110109234835.GF11671@codelibre.net>
References: <Pine.BSM.4.64L.1011281721290.27885@herc.mirbsd.org> <[🔎] 20110108123254.GB25780@codelibre.net> <[🔎] Pine.BSM.4.64L.1101092219210.17509@herc.mirbsd.org> <[🔎] 20110109234835.GF11671@codelibre.net>

Roger Leigh dixit:

>I think the "all byte sequences valid" applies mainly to narrow
>character I/O.  i.e. printf/puts etc. won't alter, drop or otherwise
>mangle any non 7-bit-ASCII codes.  i.e. I think the intent was to
>ensure 8-bit cleanliness in a 7-bit locale.  This naturally extends
>to UTF-8.  I'm not sure that wide character support is implied here,
>given that it implicity requires correct byte sequences to function
>where the narrow character I/O does not (all 8-bit codes are correct).

I was thinking in terms of programmes doing operation on wide characters
internally (for example, tr was the first one I switched to wide charac-
ters, since in MirBSD they use 16 bit, and the table driven design con-
tinued to work; this is also where I noticed the problem). Those are the
programmes you want to be aware of: they _are_ internationalised, thus
use wchar_t and multibytes and narrow I/O, or wchar_t and wide I/O, and
these will benefit from the C.UTF-8 locale; others (that just run on
byte strings as if they were characters) don’t see a difference between
it and the classical C locale anyway.

What I mean is, we try to use C.UTF-8 in places where we want to run
on text in UTF-8 but otherwise keep the normed predictable uniform
behaviour of C; in places where we operate on binary data C is pro-
bably more useful.

Hum. Do I make any sense?

Goodnight,
//mirabilos
-- 
“It is inappropriate to require that a time represented as
 seconds since the Epoch precisely represent the number of
 seconds between the referenced time and the Epoch.”
	-- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2

Reply to:

References:
- Bug#603914: Please drop non-UTF8 locales
  - From: Roger Leigh <rleigh@codelibre.net>
- Bug#603914: Please drop non-UTF8 locales
  - From: Thorsten Glaser <tg@mirbsd.de>
- Bug#603914: Please drop non-UTF8 locales
  - From: Roger Leigh <rleigh@codelibre.net>

Prev by Date: Bug#603914: Please drop non-UTF8 locales
Next by Date: Re: C.UTF-8 in squeeze
Previous by thread: Bug#603914: Please drop non-UTF8 locales
Next by thread: Processed: Reassign
Index(es):
- Date
- Thread