[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Locale-related questions



Nicolas,

OK.  I don't know exactly what I'm talking about here as you can see. :)

I'm writing an XML parser/writer/simple DOM, which will input and output
primarily in UTF-32.

This program outputs some information to stdout in the testing process,
and this is also UTF-32 (internally a character is an integer >= 32 bits).

I tried now to use putwchar:

    putwchar((wchar_t) buffer[index]);

And the output is the same as if had used

     printf("%c", (char) buffer[index]);

That is, non-ASCII characters are garbled.

All the locale settings except LC_ALL are en_US.UTF-8, LC_ALL is empty.

What I'm looking for is a cross-platform way to output some data, to aid
in the testing process.  Reading and writing from files will probably be
binary and handled internally in the program.

-Morten

On Sun, Nov 29, 2015 at 4:04 PM, Nicolas George <george@nsup.org> wrote:
Le nonidi 9 frimaire, an CCXXIV, Morten W. Petersen a écrit :
> I was looking for a locale that would enable me to putwchar a 32-bit
> Unicode character to stdout and have things handled correctly,
> automatically.  Without any re-encoding to UTF-8 and so on.

I do not know what your program is about, but what you are asking seems to
me like a very bad idea.

You should use putwchar() if you WANT automatic recoding to the current
locale, transparently and for any locale, as long as the character is
possible.

On the other hand, if as you say you do not want automatic recoding, then
you should use octet-based output functions: serialize your Unicode code
point to little or big endian as you prefer and use putchar() or fwrite() to
send it.

Note that wchar_t are Unicode code point under GNU but that is neither
guaranteed nor portable. Personally, I recommend not to use wchar_t
entirely.

Regards,

--
  Nicolas George



--

Reply to: