[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Draft] Writing i18n apps with glibc 2.2



Hello rigel,

On Mon, Oct 16, 2000 at 01:31:18AM -0700, rigel wrote:
> Hi Roger,
> On Mon, Oct 16, 2000 at 04:10:07PM +1100, Roger So wrote:
> > > 
> > > No, please don't. You should continue to use isprint to test
> > > whether a byte is printable.
> > 
> > I thought so too, but isprint(0xA7) didn't work, however
> > iswprint(0xA7) worked ...?  Now I'm confused ...
> 
> This is indeed correct. Byte 0xA7 is not a legal character in zh
> lcoales, so isprint(0xA7) should return 0. While widechar 0xA7 =
> U000000A7 = 0xA1EC (gb2312) = 0xA1B1 (big5), is a printable character,
> so iswprint(0xA7) returns 1.
> 
> Also glibc retains more information for widechar (used by iswprint)
> than for multibyte (used by isprint). Internally the binary locale
> file keep two separate sets of information: multibyte and widechar.
> All the chars presented in locale def file will be put in widechar
> part, while only those also exist in charmap file, i.e. legal chars,
> will be recorded in multibyte part. For example, U00A6 exist in zh_HK
> def files, although it's not a legal character in Big5HKSCS charmap,
> iswprint(0xA6) will return 1. The same call in zh_CN and zh_TW lcoales
> will result a 0, because U00A6 is not exist in zh_CN and zh_TW def
> files.

Thank you for the clarification -- I stand corrected.

So, given a stream of bytes which might contain multibyte characters,
how would I test whether a byte is, say, printable?  Do I need to test
for MB_CUR_MIN to MB_CUR_MAX number of bytes instead of individual
bytes?  (seems wildly inefficient ...)

Also, in glibc, are widechars always in Unicode? (UCS-4?)

> > > Just a small thing. A new LC_CTYPE class "hanzi" was added in
> > > glibc 2.2 locale (both zh_CN and zh_TW have it, zh_HK doesn't
> > > though).
> > 
> > Hmm ... that's a bug ...
> 
> Well, not really a bug. I added this hanzi class in zh_CN. zh_TW's
> CTYPE simply copy zh_CN, while zh_HK copy "i18n".

Then zh_HK should copy "zh_CN" instead ...?

BTW several definitions in zh_HK seems to be wrong; when I get the time
I shall have a closer look.  Also it seems that an en_HK locale would be
nice for people like me :)

-- 
  Roger So                                            telnet://e-fever.org
  spacehunt at e-fever dot org                          SysOp, e-Fever BBS
  GnuPG  1024D/98FAA0AD  F2C3 4136 8FB1 7502 0C0C 01B1 0E59 37AC 98FA A0AD



Reply to: