Thorsten Glaser wrote:
Albert Cahalan dixit:Unless plain "C" goes UTF-8Not going to happen, it’s not binary-safe. (I fought that in MirBSD with the OPTU-8/16 encoding scheme.)
Why not? Note that usual functions work on bytes, not on characters, and on POSIX utilities the old/classical options work on bytes by default. POSIX introduced new options for characters. E.g. the -c in 'wc' means really bytes, not characters (which is given by -m). Not so logical, but
compatible with the expected old behaviour. POSIX was discussing if is is "legal" to have a UTF-8 POSIX/C locale.IIRC the doubts was about the language in the standard, not about real problems. OTOH they acknowledged that real bugs could appear.
OTOH I use by default the UTF-8 locale, because I don't expect that Debian will corrupt my data. And I think system utilities will do
the right things with locale. I start to think that moving C to UTF-8 will be the real simpler and faster way to *hide* most of the encoding bugs. ciao cate