Re: Make Unicode bugs release critical?
On Fri, Feb 11, 2011 at 08:16:54PM -0200, Henrique de Moraes Holschuh wrote:
> On Fri, 11 Feb 2011, Lars Wirzenius wrote:
> > However, I'm curious: is there a lot of software that is broken with
> > Unicode, particularly with the UTF-8 encoding? I can't remember anything
> > much in recent times.
>
> 2. Anything that cannot deal with Supplementary planes.
>
> This includes the use of UCS-2 instead of UTF-16, as it cannot represent
> the Supplementary planes. python 3 when not compiled to use UCS-4 memory
> hog mode is an example, I am told.
Using UCS-2 is hardly better than using ISO-8859-1 or any other ancient
charset. Using either UTF-16 or UCS-4 can be a memory hog, that's why to
pick UTF-8 for regular use. Except for some rare cases (CJK with no
formatting or markup), it uses less memory and can be passed as-is to POSIX
file functions.
Picking a random subset of Unicode is like putting day-of-the-year in one
byte variable since this way you support 70% of uses and it conserves
memory...
--
1KB // Microsoft corollary to Hanlon's razor:
// Never attribute to stupidity what can be
// adequately explained by malice.
Reply to: