[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8 locales



Hi,

At Tue, 21 Nov 2000 11:46:41 +0700,
Theppitak Karoonboonayanan <thep@links.nectec.or.th> wrote:

> > For conversion from number of characters to number of columns, you 
> > will need to use wcwidth() or wcswidth().
> 
> I'm interested in this. I used to work the th_TH locale for glibc, and
> I'd like to know how to describe this conversion in the locale definition.

The current woody system seems to be able to count width of
Thai combining characters both in TIS-620 and in UTF-8.

I wrote the following test program.

--------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>
#include <langinfo.h>

main()
{
  char buf1[10000];
  wchar_t buf2[10000];
  int l;

  setlocale(LC_ALL, "");
  printf("encoding = %s\n", nl_langinfo(CODESET));

  fgets(buf1, 10000, stdin);
  l = strlen(buf1); if (l) buf1[l-1] = 0; /* Perl's chomp() */

  printf("string = \"%s\", %d bytes\n", buf1, strlen(buf1));

  mbstowcs(buf2, buf1, 10000);
  printf("wstring = %d chars\n", wcslen(buf2));
  printf("width = %d columns\n", wcswidth(buf2, 10000));
}
--------------------------------------

Then I invoked this program in txiterm.  Ensure LANG=th_TH.
Type some Thai characters including combining characters and [Enter].

Next, I invoked this program in xterm in UTF-8 mode.  Since my
system has ko_KR.UTF-8 locale, I used it (LANG=ko_KR.UTF-8).
Prepare a Thai UTF-8 file and invoke the program with redirection.

Both of them seems to work well.  Please try.

I also found that wcwidth() returns zero for a combining (accent-only)
character.  I think this is reasonable.

---
Tomohiro KUBOTA <kubota@debian.org>
http://surfchem0.riken.go.jp/~kubota/



Reply to: