Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

To: Thorsten Glaser <tg@mirbsd.de>, 522776@bugs.debian.org
Subject: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
From: "Giacomo A. Catenazzi" <cate@debian.org>
Date: Tue, 01 Dec 2009 17:42:39 +0100
Message-id: <[🔎] 4B15477F.6060808@debian.org>
Reply-to: "Giacomo A. Catenazzi" <cate@debian.org>, 522776@bugs.debian.org
In-reply-to: <Pine.BSM.4.64L.0911271036160.18130@herc.mirbsd.org>
References: <787b0d920911261806t3ba31ae3vda6ff828dc0f3874@mail.gmail.com> <Pine.BSM.4.64L.0911271036160.18130@herc.mirbsd.org>

Thorsten Glaser wrote:

Albert Cahalan dixit:

Unless plain "C" goes UTF-8


Not going to happen, it’s not binary-safe. (I fought that in
MirBSD with the OPTU-8/16 encoding scheme.)

Why not? Note that usual functions work on bytes, not on characters, andon POSIX utilities the old/classical options work on bytes by default.POSIX introduced new options for characters. E.g. the -c in 'wc' meansreally bytes, not characters (which is given by -m). Not so logical, but

compatible with the expected old behaviour.

POSIX was discussing if is is "legal" to have a UTF-8 POSIX/C locale.

IIRC the doubts was about the language in the standard, not about realproblems. OTOH they acknowledged that real bugs could appear.

OTOH I use by default the UTF-8 locale, because I don't expect thatDebian will corrupt my data. And I think system utilities will do

the right things with locale.


I start to think that moving C to UTF-8 will be the real simpler and
faster way to *hide* most of the encoding bugs.

ciao
	cate

Reply to:

Follow-Ups:
- Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: Thorsten Glaser <tg@mirbsd.de>

Prev by Date: Bug#555982: alternative?
Next by Date: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
Previous by thread: Bug#555982: alternative?
Next by thread: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
Index(es):
- Date
- Thread