[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#603914: Please drop non-UTF8 locales



Roger Leigh dixit:

>I think the "all byte sequences valid" applies mainly to narrow
>character I/O.  i.e. printf/puts etc. won't alter, drop or otherwise
>mangle any non 7-bit-ASCII codes.  i.e. I think the intent was to
>ensure 8-bit cleanliness in a 7-bit locale.  This naturally extends
>to UTF-8.  I'm not sure that wide character support is implied here,
>given that it implicity requires correct byte sequences to function
>where the narrow character I/O does not (all 8-bit codes are correct).

I was thinking in terms of programmes doing operation on wide characters
internally (for example, tr was the first one I switched to wide charac-
ters, since in MirBSD they use 16 bit, and the table driven design con-
tinued to work; this is also where I noticed the problem). Those are the
programmes you want to be aware of: they _are_ internationalised, thus
use wchar_t and multibytes and narrow I/O, or wchar_t and wide I/O, and
these will benefit from the C.UTF-8 locale; others (that just run on
byte strings as if they were characters) don’t see a difference between
it and the classical C locale anyway.

What I mean is, we try to use C.UTF-8 in places where we want to run
on text in UTF-8 but otherwise keep the normed predictable uniform
behaviour of C; in places where we operate on binary data C is pro-
bably more useful.

Hum. Do I make any sense?

Goodnight,
//mirabilos
-- 
“It is inappropriate to require that a time represented as
 seconds since the Epoch precisely represent the number of
 seconds between the referenced time and the Epoch.”
	-- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2



Reply to: