Re: moving to unicode

To: debian-user@lists.debian.org
Subject: Re: moving to unicode
From: hendrik@topoi.pooq.com
Date: Mon, 6 Feb 2006 11:12:47 -0500
Message-id: <[🔎] 20060206161247.GA2569@topoi.pooq.com>
In-reply-to: <[🔎] DA430F01FCE3E14EB8DE9D02C35C7FAA01B579@scbu01.cb.i.cz>
References: <[🔎] DA430F01FCE3E14EB8DE9D02C35C7FAA01B579@scbu01.cb.i.cz>

On Mon, Feb 06, 2006 at 03:01:41PM +0100, ???ek Kry?tof wrote:
> I just second this. Only IMO the UCS2 (fixed two bytes per character) would be much more appropriate to a modern UNICODE system. The variable length (2 to 3 bytes ) UTF-8 encoding can marginally save some space (depending on language) but introduces nasty overhead to character handling - even the most trivial string functions have to check for character boundaries (e.g. even detecting the string length itself is not a trivial operation in UTF-8 !!! or having a fixed length buffer you can never tell in advance how many characters will fit into it - it depends on the language again).
> 
> Windows used to have mulitbyte characters in the past (Win95,98) but luckily managed to get rid of this with Windows NT and higher and now both the kernel and userspace is UCS2. Why should Linux again enter the blind alley of Windows 95?
> 
> Cheers
> Krystof

Have youi looked at Unicode lately?  It isn't a sizteen-bit code 
anymore. (Was it ever?)  It doesn't fit in two bytes.  If you chop it 
to two, you miss the vast majority of traditional Chinese characters, as 
well as (I believe) character sets such as Tolkien's Elvish.

-- hendrik

Reply to:

References:
- RE: moving to unicode
  - From: Žáček Kryštof <Krystof.Zacek@i.cz>

Prev by Date: Re: How to add a new dir to my PATH?
Next by Date: Re: Azureus and the TCP port 6881
Previous by thread: RE: moving to unicode
Next by thread: Re: moving to unicode
Index(es):
- Date
- Thread