Re: Chinese big5 encoding and PO files
On Wed, Jan 29, 2003 at 11:14:56AM +0100, Peter Karlsson wrote:
> Denis Barbier:
>
> > Err, ascii(7) tells me that 0x5C *is* a backslash.
>
> Yes, but these documents aren't ASCII, so 0x5C may not or may not be a
> backslash there, depending on where they are located in the file.
Ok.
> > Could you please have a look at chinese/po/others.zh.po and tell me
> > what to do with Subscribe/Unsubscribe translations?
>
> Nothing should need to be done, since the 0x5C byte is the trail byte
> of the character, a proper MBCS aware string scanner will recognize
> that it is not a backslash character (unlike, for instance, in the
> "please respect the ad policy" string a bit further down, which *does*
> contain a backslash in the translation). Getting the string scanner to
> work properly requires configuring the locales properly.
The problem with current WML is that streams are bytes and not characters,
this is why 0x5C bytes have to be escaped.
I am preparing a character oriented version, but there are major backward
compatibility problems. It means that any single file must contain only
one encoding, some files have to be fixed under webwml.
> Big5 is a bit problematic since it allows non-highbit characters as
> trail bytes, similar to the problems with ISO 2022-JP. A stateful
> string scanner is required to handle it properly. LibC should work fine
> as long as the proper locale is available, and I am pretty sure that
> the gettext utilities will handle this properly.
Yes, gettext is safe.
Instead of escaping some problematic characters, a better solution could
be to perform encoding conversions (as with Japanese files) to a safe
encoding. Is there anyone interested in testing this scheme?
Denis
Reply to: