[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#522776: locale dependend compilation



Ok, maybe I found the problem.

Giacomo A. Catenazzi wrote:
 > No ;-)  Ok, it take me some modifications of your program and
looking to POSIX to discover the reason.

You forget to check error codes. In this case we have
"Invalid or incomplete multibyte or wide character" in the
non UTF-8 locale.

So looking to POSIX:
"Wide-character codes for other characters are locale and implementation-defined."
so you (and me) compiled the code with UTF-8, so in binary there is
different wchar representation. Which is invalid on non-UTF-8 locale.

Note that that it is locale dependent, so same charset with different
language could give different results (I don't know if there are such
cases on glibc).

So it means that NO portable programs could use constant (i.e. as fixed
value in sources) wchars and wstrings, because a compiled program has
now way to distinguish a wstring build at compiler time and a wstring from
outside, thus with possible two different locales/charsets.
[GCC uses as default UTF-16 or UTF-32 for wchar, according to the space need
for chars in current locale]

BTW we have a similar problem with "normal" strings.

This is very unfortunate, and it is *worse* than the initial problem.
Changing locale will not solve this, but probably will reduce the
visibility of the error. [no locale specified means UTF-8 for GCC].

So maybe we need to specify the locale to be passed to debian/rule
or the parameter to gcc, in order to have a (default) fix source
encoding.

But this doesn't not solve the problem. An encoded UTF-8 or
UTF-32 (for wchar) is not decoded correctly on non UTF-8 terminals.

But in this case we have iconv() function (because NOW we know the
inizial encoding), to convert constant-string to the right locale.


So: programs that use constant wchar or string with chars outside ASCII
must be compiled with the right encoding (ev. with right locale), specified
in debian/rule (or every developer will see a different output).
Such program should convert the string to the right locale, before to
print it to terminal.


Alternatively, the string must be put outside source code, and read
from a file. The iconv() apply also in this case.


PS: requiring "us_EN.UTF-8" as default to debian/rule seems also
nice, so logs can be read from all developers.

Possibly also "C" in UTF-8 could be good. Such "C" should have
only charset UTF-8 and not other additional meaning to
characters outside ASCII-7.  But this should be carefully tested:
I really things that there are existing wrong assumption and
cases we forgot.


So ok: I think I've understood the problem (but part of the bug
is in the program / Makefile).

ciao
	cate



Reply to: