Re: default character encoding for everything in debian

To: Harald Braumann <harry@unheit.net>
Cc: debian-devel@lists.debian.org
Subject: Re: default character encoding for everything in debian
From: Gunnar Wolf <gwolf@gwolf.org>
Date: Tue, 11 Aug 2009 13:28:08 -0500
Message-id: <[🔎] 20090811182808.GE19541@cajita.gateway.2wire.net>
In-reply-to: <[🔎] 20090811013358.65b60b98@sbs173>
References: <[🔎] 200908101309.22076.thomas@koch.ro> <[🔎] 20090810114540.GA13301@puntila.winnegan.fake> <[🔎] 20090811013358.65b60b98@sbs173>

Harald Braumann dijo [Tue, Aug 11, 2009 at 01:33:58AM +0200]:
> > There are a lot of users out there that are not willing to pay the
> > price for increased generality.
> 
> Don't you mean s/users/programmers? As a user I don't see what price I
> pay. I only see advantages in having a consistent encoding. Which,
> btw., doesn't have to be UTF-8. In an ideal world every programme would
> adhere to LC_CTYPE. But if the encoding has to be configured then I
> would also prefer UTF-8 as the default.
> 
> Of course, for the programmer there might be a price to pay. And if
> he's not willing to pay it, he can't be forced, anyway.
> 
> Or do you mean the user pays the price, because if the encoding is set
> to UTF-8 then performance would suffer? In that case, I'd love to see
> some real life numbers. I doubt the difference would be noticeable. 

Yes, performance will suffer. We enjoyed many decades of blissfully
ignoring the difference between a character and a byte. So, while
length(str) in any language up to the 1990s was a mere substraction,
now we must go through the string checking each byte to see if it is a
Unicode marker and substract the appropriate number of bytes. Also,
for a very long time we didn't really care much what was a buffer's
content - Everything could be printed, even if it had control
characters which made you beep (with the ocassional control sequence
re-injecting output into the terminal as input). Now... Well, printing
an unprintable string can cause segfaults in some cases.

-- 
Gunnar Wolf • gwolf@gwolf.org • (+52-55)5623-0154 / 1451-2244

Reply to:

Follow-Ups:
- Re: default character encoding for everything in debian
  - From: Samuel Thibault <sthibault@debian.org>
- Re: default character encoding for everything in debian
  - From: Harald Braumann <harry@unheit.net>

References:
- default character encoding for everything in debian
  - From: Thomas Koch <thomas@koch.ro>
- Re: default character encoding for everything in debian
  - From: Siggy Brentrup <debian@psycho.i21k.de>
- Re: default character encoding for everything in debian
  - From: Harald Braumann <harry@unheit.net>

Prev by Date: Re: default character encoding for everything in debian
Next by Date: Re: Bits from the release team and request for discussion
Previous by thread: Re: default character encoding for everything in debian
Next by thread: Re: default character encoding for everything in debian
Index(es):
- Date
- Thread