[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: default character encoding for everything in debian

On Tue, 11 Aug 2009 13:28:08 -0500
Gunnar Wolf <gwolf@gwolf.org> wrote:

> Harald Braumann dijo [Tue, Aug 11, 2009 at 01:33:58AM +0200]:
> > > There are a lot of users out there that are not willing to pay the
> > > price for increased generality.
> > 
> > Don't you mean s/users/programmers? As a user I don't see what
> > price I pay. I only see advantages in having a consistent encoding.
> > Which, btw., doesn't have to be UTF-8. In an ideal world every
> > programme would adhere to LC_CTYPE. But if the encoding has to be
> > configured then I would also prefer UTF-8 as the default.
> > 
> > Of course, for the programmer there might be a price to pay. And if
> > he's not willing to pay it, he can't be forced, anyway.
> > 
> > Or do you mean the user pays the price, because if the encoding is
> > set to UTF-8 then performance would suffer? In that case, I'd love
> > to see some real life numbers. I doubt the difference would be
> > noticeable. 
> Yes, performance will suffer. We enjoyed many decades of blissfully
> ignoring the difference between a character and a byte. 

Well, a byte with the most significant bit always set to 0.

> So, while
> length(str) in any language up to the 1990s was a mere substraction,
> now we must go through the string checking each byte to see if it is a
> Unicode marker and substract the appropriate number of bytes. Also,
> for a very long time we didn't really care much what was a buffer's
> content - 

And in these glorious times more often than not unintelligible
rubbish was produced if you happened to not use a language that can be
written in ASCII. But this is besides the point. I do appreciate that
support for different character encodings causes pain for the
programmer. But the original post was about software that already
has got support for UTF-8 and whether it wouldn't be good idea to
configure it this way by default.
> Everything could be printed, even if it had control
> characters which made you beep (with the ocassional control sequence
> re-injecting output into the terminal as input). Now... Well, printing
> an unprintable string can cause segfaults in some cases.

My terminal supports UTF-8. I thought that this is not an issue any


Attachment: signature.asc
Description: PGP signature

Reply to: