UTF-8 locales
Hi,
I am interested in support of various character codes.
I suppose a certain amount of developers are interested in UTF-8
support. They are trying adding UTF-8 support for their softwares
such as Xterm, GNU roff, and so on.
I believe that UTF-8 support should be implemented using
LOCALE technology, i.e., calling setlocale(LC_ALL,"");, using
wchar_t instead of char, and leaving everything to the OS. The
advantage of this method is:
- the software will support not only UTF-8 but also many
character codes in the world (including multibyte ones).
This helps users to transit into UTF-8 smoothly and gradually.
- the software can provide a united way to determine the
character code to be used, i.e., LANG variable and so on.
Otherwise users have to remember methods to enable UTF-8
mode for every softwares they are using. (For example,
'-u8' option for Xterm.)
- softwares which are already written using LOCALE technology
don't need any modification. In other words, such softwares
have already become to support UTF-8.
Note that LOCALE programming is not difficult nor troublesome
than UTF-8 programming.
Solaris takes this model. Read
http://docs.sun.com/ab2/coll.651.1/SOLUNICOSUPPT
for detail.
However, the current woody system (with locale 2.1.97-1) has only
one UTF-8 locale of ko_KR.utf8. UTF-8 locales are needed for this
model to work well. Why only it?
---
Tomohiro KUBOTA <kubota@debian.org>
http://surfchem0.riken.go.jp/~kubota/
Reply to: