[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: new locale: de-utf?



Matt Zimmerman wrote:
Since this thread seems to have gotten the attention of folks who would
know, could someone tell me how to deal with #180823?

What is apt_listchanges trying to do here? It appears that it somehow wants to find a Japanese codec, because it has somewhere seen the encoding EUC-JP. There is a JapaneseCodecs package out there for Python; if that was installed, it would have found the japanese codec.

The second error is more puzzling. What is the specific data it tries to process in this very moment? From the third report, it appears that the culprit is "subject". Where does that come from? If you get a UnicodeError, it has worked as designed: The string you are passing is invalid UTF-8, so you should not put it into a header claiming that it is UTF-8

I suggest you drop locale.getlocale()[0], and do just nl_langinfo(CODESET) to find out what the user's codeset is. You can then catch LookupError, and work around the problem by converting to
ASCII (i.e. replacing all non-ASCII bytes with question mark:

   try:
      subject = Header(subject, nl_langinfo(CODESET)
   except LookupError:
      # work around Python bug
      subject = unicode(subject, 'latin-1').encode('ascii','replace')
      subject = Header(subject, 'us-ascii')
   except KeyError:
      # bogus input data, complain to use
      raise

You should file a Python bug: While the UnicodeError is intentional and indicates bogus input data, the LookupError is not. If no codec is found, Header.py should just assume that the byte string is in the declared encoding.

Regards,
Martin



Reply to: