[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

UTF-8, CJK and file size



Drew Parsons writes:

> For the Asian CJK characters, however, UTF-8 typically uses 3 bytes
> per character.  This is in contrast to the current national
> encodings which use 2 bytes per character.  The files will,
> therefore, become half again as big in size, and consequently their
> transmission over the internet will take half again as long.

It would be interesting and relevant to know what the typical size
difference between a gzip'd CJK-encoded document and the equivalent
gzip'd UTF-8-encoded document is.

ttfn/rjk



Reply to: