[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8, CJK and file size



Hi,

At Thu, 12 Jul 2001 16:44:50 -0700,
phil@bolthole.com wrote:

> That's a trick question, in a way. Because while an english word
> may be anywhere from 1-10 letters, with an average length of maybe 5
> letters; a japanese "word" has an average of about 2 "letters".
> (dont forget 1 kanji => 1 word )
> Except that you then get conjugation, so there may be an extra "letter"
> or two thrown in there.

Why not use existing Debian web pages, man pages, or message catalogs
for this purpose?  It is very easy to use iconv(1) to convert them
into UTF-8.  Note that Japanese manpages/message catalogs are encoded
in EUC-JP.

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/



Reply to: