[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: default character encoding for everything in debian

Harald Braumann dijo [Tue, Aug 11, 2009 at 01:33:58AM +0200]:
> > There are a lot of users out there that are not willing to pay the
> > price for increased generality.
> Don't you mean s/users/programmers? As a user I don't see what price I
> pay. I only see advantages in having a consistent encoding. Which,
> btw., doesn't have to be UTF-8. In an ideal world every programme would
> adhere to LC_CTYPE. But if the encoding has to be configured then I
> would also prefer UTF-8 as the default.
> Of course, for the programmer there might be a price to pay. And if
> he's not willing to pay it, he can't be forced, anyway.
> Or do you mean the user pays the price, because if the encoding is set
> to UTF-8 then performance would suffer? In that case, I'd love to see
> some real life numbers. I doubt the difference would be noticeable. 

Yes, performance will suffer. We enjoyed many decades of blissfully
ignoring the difference between a character and a byte. So, while
length(str) in any language up to the 1990s was a mere substraction,
now we must go through the string checking each byte to see if it is a
Unicode marker and substract the appropriate number of bytes. Also,
for a very long time we didn't really care much what was a buffer's
content - Everything could be printed, even if it had control
characters which made you beep (with the ocassional control sequence
re-injecting output into the terminal as input). Now... Well, printing
an unprintable string can cause segfaults in some cases.

Gunnar Wolf • gwolf@gwolf.org • (+52-55)5623-0154 / 1451-2244

Reply to: