[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: I/O for different encodings



Hi.

In <E13tzkw-0000yK-00@rakefet>,  on Fri, 10 Nov 2000 00:01:10 +0200,
 Shaul Karl <shaulka@bezeqint.net> wrote:

> > I'm working on a piece of software that will parse textual data (a
> > list of words), conduct some statistical analyses, and spit out more
> > textual data.  I'd like to support multiple languages, maybe even
> > multibyte encodings.  Can someone please point me towards some
> > resources, in particular how to handle text input and output in a
> > language-independent way?  As you can probably guess, I'm new to i18n.

> Not sure but I believe that everything is in the process of convergence to 
> Unicode (UTF8). Therefore, if I would have written such a program I would make 
> it to use this encoding.
> As for resources, there is a Unicode HOWTO on the LDP and many other resources 
> on the net.

I think the support of UTF8 is a minimum (or essential) requirement
for i18n especially in multibyte encodings.  There are some software
which claims the unicode support but does not support multibye encodings
correctly.  (So these "unicode supported" software can not handle some
languages which includes Japanese.)

If you can add support for more encodings then it is better than 
to support unicode only, but we can use some translation filter
(such as iconv(1), tcs(1), or so) to process text data, so the support
of Unicode (or UCS-4 which is better) is workable compromise, maybe.

Maybe You can read the discussion about i18n of groff on this list
from the web archive recently.  I think it has something useful
for you.

Regards.
-- 
  Taketoshi Sano: <sano@debian.org>,<sano@debian.or.jp>,<kgh12351@nifty.ne.jp>



Reply to: