Bug#522776: locale dependend compilation

To: "Giacomo A. Catenazzi" <cate@debian.org>, 522776@bugs.debian.org
Cc: Roger Leigh <rleigh@codelibre.net>, Steve Langasek <vorlon@debian.org>, Thorsten Glaser <tg@mirbsd.de>
Subject: Bug#522776: locale dependend compilation
From: "Giacomo A. Catenazzi" <cate@debian.org>
Date: Wed, 08 Apr 2009 14:47:30 +0200
Message-id: <[🔎] 49DC9CE2.50902@debian.org>
Reply-to: "Giacomo A. Catenazzi" <cate@debian.org>, 522776@bugs.debian.org
In-reply-to: <[🔎] 49DC7464.60806@debian.org>
References: <[🔎] 20090406120655.27815.2545.reportbug@lenny.mirbsd.org> <[🔎] 49DA0B6A.7060107@debian.org> <[🔎] Pine.BSM.4.64L.0904061727410.28766@herc.mirbsd.org> <[🔎] 20090406180917.GA23092@dario.dodds.net> <[🔎] 20090406215226.GB18298@codelibre.net> <[🔎] 49DB1084.9060707@debian.org> <[🔎] 20090407213324.GD12845@codelibre.net> <[🔎] 49DC7464.60806@debian.org>

Ok, maybe I found the problem.

Giacomo A. Catenazzi wrote:
 > No ;-)  Ok, it take me some modifications of your program and

looking to POSIX to discover the reason.

You forget to check error codes. In this case we have
"Invalid or incomplete multibyte or wide character" in the
non UTF-8 locale.

So looking to POSIX:

"Wide-character codes for other characters are locale andimplementation-defined."

so you (and me) compiled the code with UTF-8, so in binary there is
different wchar representation. Which is invalid on non-UTF-8 locale.

Note that that it is locale dependent, so same charset with different
language could give different results (I don't know if there are such
cases on glibc).


So it means that NO portable programs could use constant (i.e. as fixed
value in sources) wchars and wstrings, because a compiled program has
now way to distinguish a wstring build at compiler time and a wstring from
outside, thus with possible two different locales/charsets.
[GCC uses as default UTF-16 or UTF-32 for wchar, according to the space need
for chars in current locale]

BTW we have a similar problem with "normal" strings.

This is very unfortunate, and it is *worse* than the initial problem.
Changing locale will not solve this, but probably will reduce the
visibility of the error. [no locale specified means UTF-8 for GCC].

So maybe we need to specify the locale to be passed to debian/rule
or the parameter to gcc, in order to have a (default) fix source
encoding.

But this doesn't not solve the problem. An encoded UTF-8 or
UTF-32 (for wchar) is not decoded correctly on non UTF-8 terminals.

But in this case we have iconv() function (because NOW we know the
inizial encoding), to convert constant-string to the right locale.


So: programs that use constant wchar or string with chars outside ASCII
must be compiled with the right encoding (ev. with right locale), specified
in debian/rule (or every developer will see a different output).
Such program should convert the string to the right locale, before to
print it to terminal.


Alternatively, the string must be put outside source code, and read
from a file. The iconv() apply also in this case.


PS: requiring "us_EN.UTF-8" as default to debian/rule seems also
nice, so logs can be read from all developers.

Possibly also "C" in UTF-8 could be good. Such "C" should have
only charset UTF-8 and not other additional meaning to
characters outside ASCII-7.  But this should be carefully tested:
I really things that there are existing wrong assumption and
cases we forgot.


So ok: I think I've understood the problem (but part of the bug
is in the program / Makefile).

ciao
	cate

Reply to:

References:
- Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: Thorsten Glaser <tg@mirbsd.de>
- Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: "Giacomo A. Catenazzi" <cate@debian.org>
- Re: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: Thorsten Glaser <tg@mirbsd.de>
- Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: Steve Langasek <vorlon@debian.org>
- Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: Roger Leigh <rleigh@codelibre.net>
- Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: "Giacomo A. Catenazzi" <cate@debian.org>
- Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: Roger Leigh <rleigh@codelibre.net>
- Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: "Giacomo A. Catenazzi" <cate@debian.org>

Prev by Date: Re: does /var/games have to be deleted on purge? (if it's empty..)
Next by Date: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
Previous by thread: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
Next by thread: Re: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
Index(es):
- Date
- Thread