Re: OT: Python

To: debian-devel@lists.debian.org
Subject: Re: OT: Python
From: Russ Allbery <rra@debian.org>
Date: Mon, 14 Feb 2011 13:11:04 -0800
Message-id: <[🔎] 87lj1ijp93.fsf@windlord.stanford.edu>
In-reply-to: <[🔎] 19801.23455.536473.211939@chiark.greenend.org.uk> (Ian Jackson's message of "Mon, 14 Feb 2011 16:43:11 +0000")
References: <[🔎] 1297375750-sup-7355@gillespie.rupamsunyata.org> <[🔎] 20110211000216.GG8747@onerussian.com> <[🔎] 20110211084733.GA30787@angband.pl> <[🔎] 20110211183343.GP12557@sym.noone.org> <[🔎] 1297676104.3044.218.camel@meh> <[🔎] 19801.8997.829350.140559@chiark.greenend.org.uk> <[🔎] 20110214131425.GA4744@jwilk.net> <[🔎] 20110214133736.GB6167@ikki.ethgen.ch> <[🔎] 20110214143302.GA6400@jwilk.net> <[🔎] 19801.18743.486394.290910@chiark.greenend.org.uk> <[🔎] 20110214162139.GF6167@ikki.ethgen.ch> <[🔎] 19801.23455.536473.211939@chiark.greenend.org.uk>

Ian Jackson <ijackson@chiark.greenend.org.uk> writes:
> Klaus Ethgen writes:

>> No, it is not. 00a3 is just not a utf-8 character, it is unicode. To
>> get a correct utf-8 character you need to print \x{c2a3} and then
>> isutf8 is happy.

> When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode
> characters to stdout should use UTF-8.  That's what LC_TYPE means.

Perl is specifically documented to not do this for backward compatibility
reasons.  In Perl, which is the one I know best, you are required to
decode input and encode output if you want to have UTF-8 handling.

windlord:~> env LC_CTYPE=en_US.UTF-8 perl -e 'print "\x{00a3}\n"'
<glyph for mangled Unicode character>
windlord:~> env LC_CTYPE=en_US.UTF-8 perl -MEncode -e 'print encode("utf-8", "\x{00a3}\n")'
<proper Unicode pound sign>

See perlunicode(1).  There are a variety of reasons for this that turn out
to be fairly good ones if you don't want to badly break a bunch of
existing Perl scripts that were dealing with, for example, binary data.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>

Reply to:

Follow-Ups:
- Re: OT: Python
  - From: Vincent Lefevre <vincent@vinc17.net>

References:
- RFA: all my packages
  - From: Decklin Foster <decklin@red-bean.com>
- Re: RFA: all my packages
  - From: Yaroslav Halchenko <debian@onerussian.com>
- Re: RFA: all my packages
  - From: Adam Borowski <kilobyte@angband.pl>
- Re: Make Unicode bugs release critical? (was: Re: RFA: all my packages)
  - From: Axel Beckert <abe@debian.org>
- Re: Make Unicode bugs release critical? (was: Re: RFA: all my packages)
  - From: Josselin Mouette <joss@debian.org>
- Re: Make Unicode bugs release critical? (was: Re: RFA: all my packages)
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>
- Re: Make Unicode bugs release critical? (was: Re: RFA: all my packages)
  - From: Jakub Wilk <jwilk@debian.org>
- OT: Python (was: Make Unicode bugs release critical?)
  - From: Klaus Ethgen <Klaus@Ethgen.de>
- Re: OT: Python (was: Make Unicode bugs release critical?)
  - From: Jakub Wilk <jwilk@debian.org>
- Re: OT: Python (was: Make Unicode bugs release critical?)
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>
- Re: OT: Python (was: Make Unicode bugs release critical?)
  - From: Klaus Ethgen <Klaus@Ethgen.de>
- Re: OT: Python (was: Make Unicode bugs release critical?)
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>

Prev by Date: Re: Upcoming FTPMaster meeting
Next by Date: How to close a Ubuntu bug?
Previous by thread: Re: OT: Python (was: Make Unicode bugs release critical?)
Next by thread: Re: OT: Python
Index(es):
- Date
- Thread