Re: UTF-8, CJK and file size
On Thu, Jul 12, 2001 at 11:23:24PM +1000, Drew Parsons wrote:
> ...
> A typical novel in English is maybe 300 pages long. Suppose there's
> about 35 lines per page and 50 odd letters per line (judging from the
> novel I'm currently reading). That's about 500 KB for one novel. How
> many bytes would a Japanese novel take up?
That's a trick question, in a way. Because while an english word
may be anywhere from 1-10 letters, with an average length of maybe 5
letters; a japanese "word" has an average of about 2 "letters".
(dont forget 1 kanji => 1 word )
Except that you then get conjugation, so there may be an extra "letter"
or two thrown in there.
So, byte-wise, they should be roughly the same.
5 english letters == 5 bytes per word average
2.5 japanese "letters" == 2.5 unicode characters average ==5 bytes/word av.
Reply to: