[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Make Unicode bugs release critical?



On Fri, Feb 11, 2011 at 08:16:54PM -0200, Henrique de Moraes Holschuh wrote:
> On Fri, 11 Feb 2011, Lars Wirzenius wrote:
> > However, I'm curious: is there a lot of software that is broken with
> > Unicode, particularly with the UTF-8 encoding? I can't remember anything
> > much in recent times.
> 
> 2. Anything that cannot deal with Supplementary planes.
> 
>    This includes the use of UCS-2 instead of UTF-16, as it cannot represent
>    the Supplementary planes.  python 3 when not compiled to use UCS-4 memory
>    hog mode is an example, I am told.

Using UCS-2 is hardly better than using ISO-8859-1 or any other ancient
charset.  Using either UTF-16 or UCS-4 can be a memory hog, that's why to
pick UTF-8 for regular use.  Except for some rare cases (CJK with no
formatting or markup), it uses less memory and can be passed as-is to POSIX
file functions.

Picking a random subset of Unicode is like putting day-of-the-year in one
byte variable since this way you support 70% of uses and it conserves
memory...

-- 
1KB		// Microsoft corollary to Hanlon's razor:
		//	Never attribute to stupidity what can be
		//	adequately explained by malice.


Reply to: