Re: new locale: de-utf?
- To: Matt Zimmerman <mdz@debian.org>, debian-devel@lists.debian.org
- Subject: Re: new locale: de-utf?
- From: "Martin v. Löwis" <martin@v.loewis.de>
- Date: Fri, 09 May 2003 17:36:40 +0200
- Message-id: <[🔎] 3EBBCB08.8010509@v.loewis.de>
- In-reply-to: <20030509143015$5aa4@gated-at.bofh.it>
- References: <20030508203012$0619@gated-at.bofh.it> <20030509015006$561d@gated-at.bofh.it> <20030509022009$6b4b@gated-at.bofh.it> <20030509101008$427d@gated-at.bofh.it> <20030509143015$5aa4@gated-at.bofh.it>
Matt Zimmerman wrote:
Since this thread seems to have gotten the attention of folks who would
know, could someone tell me how to deal with #180823?
What is apt_listchanges trying to do here? It appears that it somehow
wants to find a Japanese codec, because it has somewhere seen the
encoding EUC-JP. There is a JapaneseCodecs package out there for Python;
if that was installed, it would have found the japanese codec.
The second error is more puzzling. What is the specific data it tries to
process in this very moment? From the third report, it appears that the
culprit is "subject". Where does that come from? If you get a
UnicodeError, it has worked as designed: The string you are passing is
invalid UTF-8, so you should not put it into a header claiming that it
is UTF-8
I suggest you drop locale.getlocale()[0], and do just
nl_langinfo(CODESET) to find out what the user's codeset is. You can
then catch LookupError, and work around the problem by converting to
ASCII (i.e. replacing all non-ASCII bytes with question mark:
try:
subject = Header(subject, nl_langinfo(CODESET)
except LookupError:
# work around Python bug
subject = unicode(subject, 'latin-1').encode('ascii','replace')
subject = Header(subject, 'us-ascii')
except KeyError:
# bogus input data, complain to use
raise
You should file a Python bug: While the UnicodeError is intentional and
indicates bogus input data, the LookupError is not. If no codec is
found, Header.py should just assume that the byte string is in the
declared encoding.
Regards,
Martin
Reply to: