Re: UTF-8 locales
Hi,
At Tue, 21 Nov 2000 11:46:41 +0700,
Theppitak Karoonboonayanan <thep@links.nectec.or.th> wrote:
> > For conversion from number of characters to number of columns, you
> > will need to use wcwidth() or wcswidth().
>
> I'm interested in this. I used to work the th_TH locale for glibc, and
> I'd like to know how to describe this conversion in the locale definition.
The current woody system seems to be able to count width of
Thai combining characters both in TIS-620 and in UTF-8.
I wrote the following test program.
--------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>
#include <langinfo.h>
main()
{
char buf1[10000];
wchar_t buf2[10000];
int l;
setlocale(LC_ALL, "");
printf("encoding = %s\n", nl_langinfo(CODESET));
fgets(buf1, 10000, stdin);
l = strlen(buf1); if (l) buf1[l-1] = 0; /* Perl's chomp() */
printf("string = \"%s\", %d bytes\n", buf1, strlen(buf1));
mbstowcs(buf2, buf1, 10000);
printf("wstring = %d chars\n", wcslen(buf2));
printf("width = %d columns\n", wcswidth(buf2, 10000));
}
--------------------------------------
Then I invoked this program in txiterm. Ensure LANG=th_TH.
Type some Thai characters including combining characters and [Enter].
Next, I invoked this program in xterm in UTF-8 mode. Since my
system has ko_KR.UTF-8 locale, I used it (LANG=ko_KR.UTF-8).
Prepare a Thai UTF-8 file and invoke the program with redirection.
Both of them seems to work well. Please try.
I also found that wcwidth() returns zero for a combining (accent-only)
character. I think this is reasonable.
---
Tomohiro KUBOTA <kubota@debian.org>
http://surfchem0.riken.go.jp/~kubota/
Reply to: