Issues when reading mailboxes from alioth-lists.debian.net
Hi,
in the teammetrics project I'm trying to parse mailboxes. This worked
with Python2 but after porting the code to Python3 I get some encoding
troubles. A specific one seem to be an error in the mailbox module.
Please run the attached script test_mbox which downloads one of the
critical mbox files from aliot-lists.debian.net and calls the also
attached simple Python3 script which ends in:
Traceback (most recent call last):
File "./test_mbox.py", line 6, in <module>
if mbox_file.items() != []:
File "/usr/lib/python3.8/mailbox.py", line 132, in items
return list(self.iteritems())
File "/usr/lib/python3.8/mailbox.py", line 125, in iteritems
value = self[key]
File "/usr/lib/python3.8/mailbox.py", line 73, in __getitem__
return self.get_message(key)
File "/usr/lib/python3.8/mailbox.py", line 781, in get_message
msg.set_from(from_line[5:].decode('ascii'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 37: ordinal not in range(128)
Exit code: 1
IMHO it is a bug if those mailboxes can't be read. Am I missing
something?
Kind regards
Andreas.
--
http://fam-tille.de
#!/bin/sh
wget https://alioth-lists.debian.net/pipermail/pkg-java-maintainers/2020-May.txt.gz
gunzip 2020-May.txt.gz
python3 test_mbox.py
#!/usr/bin/python3
import mailbox
mbox_file = mailbox.mbox('2020-May.txt')
if mbox_file.items() != []:
print("OK")
Reply to: