UTF-8 locales

To: debian-i18n@lists.debian.org
Cc: debian-devel@lists.debian.org
Subject: UTF-8 locales
From: Tomohiro KUBOTA <tkubota@riken.go.jp>
Date: Mon, 13 Nov 2000 21:19:37 +0900
Message-id: <[🔎] 87r94gqd2e.wl@surfchem0.riken.go.jp>

Hi,

I am interested in support of various character codes.

I suppose a certain amount of developers are interested in UTF-8
support.  They are trying adding UTF-8 support for their softwares
such as Xterm, GNU roff, and so on.

I believe that UTF-8 support should be implemented using
LOCALE technology, i.e., calling setlocale(LC_ALL,"");, using
wchar_t instead of char, and leaving everything to the OS.  The
advantage of this method is:
 - the software will support not only UTF-8 but also many
   character codes in the world (including multibyte ones).
   This helps users to transit into UTF-8 smoothly and gradually.
 - the software can provide a united way to determine the
   character code to be used, i.e., LANG variable and so on.
   Otherwise users have to remember methods to enable UTF-8
   mode for every softwares they are using.  (For example, 
   '-u8' option for Xterm.)
 - softwares which are already written using LOCALE technology
   don't need any modification.  In other words, such softwares
   have already become to support UTF-8.
Note that LOCALE programming is not difficult nor troublesome
than UTF-8 programming.

Solaris takes this model.  Read
http://docs.sun.com/ab2/coll.651.1/SOLUNICOSUPPT
for detail.

However, the current woody system (with locale 2.1.97-1) has only
one UTF-8 locale of ko_KR.utf8.  UTF-8 locales are needed for this
model to work well.  Why only it?

---
Tomohiro KUBOTA <kubota@debian.org>
http://surfchem0.riken.go.jp/~kubota/

Reply to:

Follow-Ups:
- Re: UTF-8 locales
  - From: Marco d'Itri <md@Linux.IT>
- Re: UTF-8 locales
  - From: David Starner <dvdeug@x8b4e516e.dhcp.okstate.edu>
- Re: UTF-8 locales
  - From: GOTO Masanori <gotom@debian.or.jp>
- UTF-8 locales
  - From: NIIBE Yutaka <gniibe@chroot.org>

Prev by Date: Re: I/O for different encodings
Next by Date: Re: UTF-8 locales
Previous by thread: Re: I/O for different encodings
Next by thread: Re: UTF-8 locales
Index(es):
- Date
- Thread