[Draft] Writing i18n apps with glibc 2.2
[ This is a draft document; there might be other issues which I have
missed. ]
Target audience: programmers.
Writing i18n programs and glibc 2.2
===================================
While glibc 2.1 provided vastly better locale support over glibc 2.0,
glibc 2.2, currently in beta, has improved the handling of multibyte
locales; it is now more POSIX compliant. [is it?] However, as a result,
the semantics of many library calls have changed. The following is a
summary of issues that you should take note of if you want your programs
to be fully internationalized in a glibc 2.2 environment.
1. Don't use environment variables to determine locale settings. In
particular, do NOT use the value returned by setlocale(3) to
determine the current locale's encoding! Instead, use
nl_langinfo(3) for this purpose.
2. isprint(3) vs. iswprint(3): To test whether a byte is printable, use
iswprint(3). To test whether a character is printable, use
isprint(3). Take 0xA7DA ('我' in Big5) for example: iswprint(0xA7),
iswprint(0xDA), and isprint('我') all return true; anything else
will return false.
3. If your program depends on the output of other programs, please, set
the locale to "C" before calling other programs, or be prepared to
parse dates or whatever in English, French, Dutch, Japanese,
Chinese, Korean, Vietnamese, Thai, Russian, ... well you get my
idea. :p This is actually not related to glibc 2.2, but it's a very
common problem.
[Anything else?]
The following is a small program which demonstrates the above points.
#include <stdio.h>
#include <locale.h>
#include <langinfo.h>
#include <ctype.h>
#include <wctype.h>
int main(int argc, char *argv[])
{
printf("LC_CTYPE: %s\n", setlocale(LC_CTYPE, ""));
printf("Encoding: %s\n", nl_langinfo(CODESET));
printf("iswprint(0x%X)=%c isprint(0x%X)=%c\n",
0xA7, iswprint(0xA7) ? 'y' : 'n',
'我', isprint('我') ? 'y' : 'n');
return 0;
}
With Debian woody's libc6 2.1.94-3, the above program produces the
following output: (LC_CTYPE is set to "zh_HK")
Locale: zh_HK
Encoding: BIG5HKSCS
iswprint(0xA7)=y isprint(0xFFFFA7DA)=y
Happy coding!
[ Feedbacks are welcome! ]
--
Roger So telnet://e-fever.org
spacehunt at e-fever dot org SysOp, e-Fever BBS
GnuPG 1024D/98FAA0AD F2C3 4136 8FB1 7502 0C0C 01B1 0E59 37AC 98FA A0AD
Reply to: