[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Make Unicode bugs release critical? (was: Re: RFA: all my packages)

Josselin Mouette writes ("Re: Make Unicode bugs release critical? (was: Re: RFA: all my packages)"):
> Le lundi 14 février 2011 à 12:42 +0000, Ian Jackson a écrit : 
> > Excellent, I look forward to the removal of python.  I always hated
> > that language anyway.
> From your reply I look more forward to the removal of vm, since it broke
> the Unicode in my original email.

In fact I manually typed "<unicode pound sign>" and deliberately
avoided putting any non-ASCII in my email, to avoid things being even
more confused.

But you are making my argument for me: lots of software has
unicode handling bugs.  If we make them all release critical we might
as well give up and go home.

Regarding the specifics, which we don't really need to go into too
much detail about:

> > $ LC_CTYPE=en_GB.utf-8 python -c 'print u"\u00a3"' | cat 
> > Traceback (most recent call last):
> >   File "<string>", line 1, in <module>
> > UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
> > position 0: ordinal not in range(128)
> > $
> You must specify the encoding of your data in your bitstreams. I agree
> this is inconvenient (and one of the things I dislike in Python), but it
> is: 
>      1. completely independent of the locale (UTF8 or not) 
>      2. easy to work with once you understand how encodings in Python
>         work 

The fact that naive Python programs work (honouring LC_CTYPE as they
should) unless you pipe their output to something is clearly a bug.
The fact that it's a specification bug doesn't mean it's not a bug.

Non-naive programs contain something like the snippet below, which I
include so people who find this thread know that there is an answer.

>      3. much better in Python 3.

Yes, it's fixed in Python 3.


# For fuck's sake!
import codecs
import locale
def fix_stdout():
    sys.stdout = codecs.EncodedFile(sys.stdout, locale.getpreferredencoding())
    def null_decode(input, errors='strict'):
        return input, len(input)
    sys.stdout.decode = null_decode
# From
#  http://ewx.livejournal.com/457086.html?thread=3016574
#  http://ewx.livejournal.com/457086.html?thread=3016574
# lightly modified.
# See also Debian #415968.

Reply to: