Bug#603914: Please drop non-UTF8 locales
Roger Leigh dixit:
>I think the "all byte sequences valid" applies mainly to narrow
>character I/O. i.e. printf/puts etc. won't alter, drop or otherwise
>mangle any non 7-bit-ASCII codes. i.e. I think the intent was to
>ensure 8-bit cleanliness in a 7-bit locale. This naturally extends
>to UTF-8. I'm not sure that wide character support is implied here,
>given that it implicity requires correct byte sequences to function
>where the narrow character I/O does not (all 8-bit codes are correct).
I was thinking in terms of programmes doing operation on wide characters
internally (for example, tr was the first one I switched to wide charac-
ters, since in MirBSD they use 16 bit, and the table driven design con-
tinued to work; this is also where I noticed the problem). Those are the
programmes you want to be aware of: they _are_ internationalised, thus
use wchar_t and multibytes and narrow I/O, or wchar_t and wide I/O, and
these will benefit from the C.UTF-8 locale; others (that just run on
byte strings as if they were characters) don’t see a difference between
it and the classical C locale anyway.
What I mean is, we try to use C.UTF-8 in places where we want to run
on text in UTF-8 but otherwise keep the normed predictable uniform
behaviour of C; in places where we operate on binary data C is pro-
bably more useful.
Hum. Do I make any sense?
Goodnight,
//mirabilos
--
“It is inappropriate to require that a time represented as
seconds since the Epoch precisely represent the number of
seconds between the referenced time and the Epoch.”
-- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2
Reply to: