[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Locale-related questions



Hi,

Morten W. Petersen wrote:
> This program outputs some information to stdout in the testing process,
> and this is also UTF-32

As long as they are in stdout, the UTF-32 characters are byte
sequences of no special meaning.
The meaning as characters is attributed to them by the display
program (e.g. an xterm).


> I tried now to use putwchar:
>    putwchar((wchar_t) buffer[index]);
> And the output is the same as if had used
>    printf("%c", (char) buffer[index]);

It depends on the byte-order of 4-byte words whether the
output from this is equivalent. putwchar() might swap bytes
relative to the internal byte representation.

(Besides the waste of bytes, the byte order is another
 disadvantage of UTF-32 in comparison to UTF-8.)


> That is, non-ASCII characters are garbled.

Probably because your terminal (or other text display program)
expects UTF-8.

I googled for a terminal type which supports UTF-32, but failed.
So the only proposal i can make is to use iconv(1) in order
to convert the UTF-32 output to UTF-8 before it gets displayed.

  producer_process | iconv -f UTF-32LE -t UTF-8 | display_process

or

  producer_process | iconv -f UTF-32BE -t UTF-8 | display_process

depending on which byte order of UTF-32 your program puts out.
If you guessed the wrong order, you will probably get iconv errors
like
  iconv: illegal input sequence at position 0


Of course you should not convert to UTF-8 if the consumer process
expects UTF-32. In this case you only have to make sure that
producer and consumer work by the same byte order.
If "cross-platform" is meant seriously, then you will have to control
the byte order at output and input by own means. I.e. convert your
integers to bytes (and vice versa) by shifting in 8-bit steps.

E.g. integer to little endian (UTF-32LE):
  byte[0] = (integer >> 0) & 0xff;
  byte[1] = (integer >> 8) & 0xff;
  byte[2] = (integer >> 16) & 0xff;
  byte[3] = (integer >> 24) & 0xff;


Have a nice day :)

Thomas


Reply to: