[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: default character encoding for everything in debian

On 2009-08-10, Norbert Preining <preining@logic.at> wrote:
> On Mo, 10 Aug 2009, Roger Leigh wrote:
>> Of course there's a penalty for certain operations.  But UTF-8 is about
>> as compact as an extended encoding is going to get.
> Rubbish. You know why in Japan and other Asian countries UTF8 is not
> so common? Because many of their glyphs need 4 (four!) bytes, while
> for example jis-2022 (AFAIR) is much more compact.
> We are not living in an ASCII world anymore.

Really because of the size?  We are not living in a byte beancounting
world anymore.  At worst you double the *text* size (we're not talking
about images or anything, which are far larger), going from 2 bytes
that you need anyway to four.  ISO 2022 also wastes one bit per byte
to be 7bit safe.  If I read the Wikipedia article correctly at least
the JP escaping only needs to be put into the document once.  (Well,
or maybe several times switching back and forth if you're embedding
latin-encoded words into the text.)

Maybe I'm an ignorant European but I'm not sure that equation still
holds.  Of course there are certain tradeoffs about latin characters
being the privileged few to get a short encoding, but that doesn't
make UTF-8 bad per se to call it "rubbish".

Kind regards,
Philipp Kern

Reply to: