[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Locale-related questions



Thomas already gave an answer with relevant points.

Le nonidi 9 frimaire, an CCXXIV, Morten W. Petersen a écrit :
> I'm writing an XML parser/writer/simple DOM, which will input and output
> primarily in UTF-32.

Is there a good or unavoidable reason to use UTF-32? This is really a bad
choice of format for external representation. Nowadays, I would say that
UTF-8 should always be the preferred choice (for external representation;
for internal representation, using integers for code points may be better
depending on the use case).

> What I'm looking for is a cross-platform way to output some data, to aid
> in the testing process.  Reading and writing from files will probably be
> binary and handled internally in the program.

If you want cross-platform, stay away from wchar_t. It allows you to do
SIMPLE tings in a cross-platform way, such as printing an error message, but
no more. If you need control over the encoding, then you can not do it with
wchar_t portably. For starters, the i4s at microsoft decided that 64k
characters should be enough for everyone, so if your cross-platform includes
microsoftisms, you can not use wchar_t to represent an Unicode code point.
The i4s at sun had other interesting ideas on how to make the coding for
wchar_t itself depend on the locale.

If you really need to write UTF-32, writing the corresponding function takes
about half a minute:

void put_utf32be(FILE *f, unsigned c)
{
    putc(f, (c >> 24) & 0xFF);
    putc(f, (c >> 16) & 0xFF);
    putc(f, (c >>  8) & 0xFF);
    putc(f, (c >>  0) & 0xFF);
}

Note: I hereby place this code under the terms of the GNU GPL. And correctly
handling errors is left as an exercise to the reader.

Regards,

-- 
  Nicolas George

Attachment: signature.asc
Description: Digital signature


Reply to: