Re: Chinese big5 encoding and PO files

To: Debian www <debian-www@lists.debian.org>
Subject: Re: Chinese big5 encoding and PO files
From: barbier@linuxfr.org (Denis Barbier)
Date: Wed, 29 Jan 2003 12:52:52 +0100
Message-id: <[🔎] 20030129115252.GA30527@zobe.linuxfr.org>
Mail-followup-to: Debian www <debian-www@lists.debian.org>
In-reply-to: <[🔎] Pine.LNX.4.43.0301291110400.496-100000@ds9.cixit.se>
References: <[🔎] 20030129093240.GA21484@zobe.linuxfr.org> <[🔎] Pine.LNX.4.43.0301291110400.496-100000@ds9.cixit.se>

On Wed, Jan 29, 2003 at 11:14:56AM +0100, Peter Karlsson wrote:
> Denis Barbier:
> 
> > Err, ascii(7) tells me that 0x5C *is* a backslash.
> 
> Yes, but these documents aren't ASCII, so 0x5C may not or may not be a
> backslash there, depending on where they are located in the file.

Ok.

> > Could you please have a look at chinese/po/others.zh.po and tell me
> > what to do with Subscribe/Unsubscribe translations?
> 
> Nothing should need to be done, since the 0x5C byte is the trail byte
> of the character, a proper MBCS aware string scanner will recognize
> that it is not a backslash character (unlike, for instance, in the
> "please respect the ad policy" string a bit further down, which *does*
> contain a backslash in the translation). Getting the string scanner to
> work properly requires configuring the locales properly.

The problem with current WML is that streams are bytes and not characters,
this is why 0x5C bytes have to be escaped.
I am preparing a character oriented version, but there are major backward
compatibility problems.  It means that any single file must contain only
one encoding, some files have to be fixed under webwml.

> Big5 is a bit problematic since it allows non-highbit characters as
> trail bytes, similar to the problems with ISO 2022-JP. A stateful
> string scanner is required to handle it properly. LibC should work fine
> as long as the proper locale is available, and I am pretty sure that
> the gettext utilities will handle this properly.

Yes, gettext is safe.

Instead of escaping some problematic characters, a better solution could
be to perform encoding conversions (as with Japanese files) to a safe
encoding.  Is there anyone interested in testing this scheme?

Denis

Reply to:

References:
- Re: Chinese big5 encoding and PO files
  - From: barbier@linuxfr.org (Denis Barbier)
- Re: Chinese big5 encoding and PO files
  - From: Peter Karlsson <peter@softwolves.pp.se>

Prev by Date: Processed: Re: Processed: Re: Bug#178831: packages.debian.org could use real substring searches
Next by Date: Re: Chinese big5 encoding and PO files
Previous by thread: Re: Chinese big5 encoding and PO files
Next by thread: Re: Chinese big5 encoding and PO files
Index(es):
- Date
- Thread