On Tue, 11 Aug 2009 13:28:08 -0500 Gunnar Wolf <gwolf@gwolf.org> wrote: > Harald Braumann dijo [Tue, Aug 11, 2009 at 01:33:58AM +0200]: > > > There are a lot of users out there that are not willing to pay the > > > price for increased generality. > > > > Don't you mean s/users/programmers? As a user I don't see what > > price I pay. I only see advantages in having a consistent encoding. > > Which, btw., doesn't have to be UTF-8. In an ideal world every > > programme would adhere to LC_CTYPE. But if the encoding has to be > > configured then I would also prefer UTF-8 as the default. > > > > Of course, for the programmer there might be a price to pay. And if > > he's not willing to pay it, he can't be forced, anyway. > > > > Or do you mean the user pays the price, because if the encoding is > > set to UTF-8 then performance would suffer? In that case, I'd love > > to see some real life numbers. I doubt the difference would be > > noticeable. > > Yes, performance will suffer. We enjoyed many decades of blissfully > ignoring the difference between a character and a byte. Well, a byte with the most significant bit always set to 0. > So, while > length(str) in any language up to the 1990s was a mere substraction, > now we must go through the string checking each byte to see if it is a > Unicode marker and substract the appropriate number of bytes. Also, > for a very long time we didn't really care much what was a buffer's > content - And in these glorious times more often than not unintelligible rubbish was produced if you happened to not use a language that can be written in ASCII. But this is besides the point. I do appreciate that support for different character encodings causes pain for the programmer. But the original post was about software that already has got support for UTF-8 and whether it wouldn't be good idea to configure it this way by default. > Everything could be printed, even if it had control > characters which made you beep (with the ocassional control sequence > re-injecting output into the terminal as input). Now... Well, printing > an unprintable string can cause segfaults in some cases. My terminal supports UTF-8. I thought that this is not an issue any more. Cheers, harry
Attachment:
signature.asc
Description: PGP signature