Re: new locale: de-utf?

To: Matt Zimmerman <mdz@debian.org>, debian-devel@lists.debian.org
Subject: Re: new locale: de-utf?
From: "Martin v. Löwis" <martin@v.loewis.de>
Date: Fri, 09 May 2003 17:36:40 +0200
Message-id: <[🔎] 3EBBCB08.8010509@v.loewis.de>
In-reply-to: <20030509143015$5aa4@gated-at.bofh.it>
References: <20030508203012$0619@gated-at.bofh.it> <20030509015006$561d@gated-at.bofh.it> <20030509022009$6b4b@gated-at.bofh.it> <20030509101008$427d@gated-at.bofh.it> <20030509143015$5aa4@gated-at.bofh.it>

Matt Zimmerman wrote:

Since this thread seems to have gotten the attention of folks who would
know, could someone tell me how to deal with #180823?

What is apt_listchanges trying to do here? It appears that it somehowwants to find a Japanese codec, because it has somewhere seen theencoding EUC-JP. There is a JapaneseCodecs package out there for Python;if that was installed, it would have found the japanese codec.

The second error is more puzzling. What is the specific data it tries toprocess in this very moment? From the third report, it appears that theculprit is "subject". Where does that come from? If you get aUnicodeError, it has worked as designed: The string you are passing isinvalid UTF-8, so you should not put it into a header claiming that itis UTF-8

I suggest you drop locale.getlocale()[0], and do justnl_langinfo(CODESET) to find out what the user's codeset is. You canthen catch LookupError, and work around the problem by converting to

ASCII (i.e. replacing all non-ASCII bytes with question mark:

   try:
      subject = Header(subject, nl_langinfo(CODESET)
   except LookupError:
      # work around Python bug
      subject = unicode(subject, 'latin-1').encode('ascii','replace')
      subject = Header(subject, 'us-ascii')
   except KeyError:
      # bogus input data, complain to use
      raise

You should file a Python bug: While the UnicodeError is intentional andindicates bogus input data, the LookupError is not. If no codec isfound, Header.py should just assume that the byte string is in thedeclared encoding.


Regards,
Martin

Reply to:

Follow-Ups:
- Re: new locale: de-utf?
  - From: Matt Zimmerman <mdz@debian.org>

Prev by Date: Re: Bug#192416: ITP: rsh-redone -- Reimplementation of remote shell tools.
Next by Date: Kaffe marked remove (was Release-critical Bugreport for May 9, 2003)
Previous by thread: Re: new locale: de-utf?
Next by thread: Re: new locale: de-utf?
Index(es):
- Date
- Thread