[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: default character encoding for everything in debian



Hi,

(I want to see as much UTF-8 support.  These days, it is not bad.  Try
using "sed" with UTF-8.  It works!  Of course with some understandable 
gliches.)

On Mon, Aug 10, 2009 at 08:55:27PM +0200, Norbert Preining wrote:
> On Mo, 10 Aug 2009, Roger Leigh wrote:
> > Of course there's a penalty for certain operations.  But UTF-8 is about
> > as compact as an extended encoding is going to get.
> 
> Rubbish. You know why in Japan and other Asian countries UTF8 is not
> so common? Because many of their glyphs need 4 (four!) bytes, while
> for example jis-2022 (AFAIR) is much more compact.

Hmmm... not the best example here, ... technically if you are talking
size.  We got too many encodings for Japanese.  You see too many ESC
code for jis-2022.
 
> We are not living in an ASCII world anymore.

True.

Our choice of encoding is not much to do with size.  It is inertia and
backward compatibility.

FACTS:

Many Japanese e-mail uses jis-2022 for compatibility.  (E-mail was safe
only for 7 bit data in old days).  

As far as data size goes, compact popular ones are EUC(Unix) or S-JIS(MS
system). These are used in web pages etc. still.  These are as small as
UTF-16/UCS-2 used for many Unicode data internally.

But please note new MAC and XP/Vista/... use Unicode and I see many
files can be in UTF-8.  So things are changing.

Osamu


Reply to: