[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[Draft] Writing i18n apps with glibc 2.2



[ This is a draft document; there might be other issues which I have
  missed. ]

Target audience: programmers.

Writing i18n programs and glibc 2.2
===================================

While glibc 2.1 provided vastly better locale support over glibc 2.0,
glibc 2.2, currently in beta, has improved the handling of multibyte
locales; it is now more POSIX compliant. [is it?]  However, as a result,
the semantics of many library calls have changed.  The following is a
summary of issues that you should take note of if you want your programs
to be fully internationalized in a glibc 2.2 environment.

 1. Don't use environment variables to determine locale settings.  In
    particular, do NOT use the value returned by setlocale(3) to
    determine the current locale's encoding!  Instead, use
    nl_langinfo(3) for this purpose.
 
 2. isprint(3) vs. iswprint(3): To test whether a byte is printable, use
    iswprint(3).  To test whether a character is printable, use
    isprint(3).  Take 0xA7DA ('我' in Big5) for example: iswprint(0xA7),
    iswprint(0xDA), and isprint('我') all return true; anything else
    will return false.
 
 3. If your program depends on the output of other programs, please, set
    the locale to "C" before calling other programs, or be prepared to
    parse dates or whatever in English, French, Dutch, Japanese,
    Chinese, Korean, Vietnamese, Thai, Russian, ... well you get my
    idea. :p  This is actually not related to glibc 2.2, but it's a very
    common problem.

[Anything else?]

The following is a small program which demonstrates the above points.

    #include <stdio.h>
    #include <locale.h>
    #include <langinfo.h>
    #include <ctype.h>
    #include <wctype.h>
    
    int main(int argc, char *argv[])
    {
    	printf("LC_CTYPE: %s\n", setlocale(LC_CTYPE, ""));
	printf("Encoding: %s\n", nl_langinfo(CODESET));
	
	printf("iswprint(0x%X)=%c  isprint(0x%X)=%c\n",
	        0xA7, iswprint(0xA7) ? 'y' : 'n', 
		'我', isprint('我') ? 'y' : 'n');
	
	return 0;
    }

With Debian woody's libc6 2.1.94-3, the above program produces the
following output: (LC_CTYPE is set to "zh_HK")

    Locale: zh_HK
    Encoding: BIG5HKSCS
    iswprint(0xA7)=y  isprint(0xFFFFA7DA)=y

Happy coding!

[ Feedbacks are welcome! ]

-- 
  Roger So                                            telnet://e-fever.org
  spacehunt at e-fever dot org                          SysOp, e-Fever BBS
  GnuPG  1024D/98FAA0AD  F2C3 4136 8FB1 7502 0C0C 01B1 0E59 37AC 98FA A0AD



Reply to: