[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Issues when reading mailboxes from alioth-lists.debian.net



Hi,

in the teammetrics project I'm trying to parse mailboxes.  This worked
with Python2 but after porting the code to Python3 I get some encoding
troubles.  A specific one seem to be an error in the mailbox module.
Please run the attached script test_mbox which downloads one of the
critical mbox files from aliot-lists.debian.net and calls the also
attached simple Python3 script which ends in:

Traceback (most recent call last):
  File "./test_mbox.py", line 6, in <module>
    if mbox_file.items() != []:
  File "/usr/lib/python3.8/mailbox.py", line 132, in items
    return list(self.iteritems())
  File "/usr/lib/python3.8/mailbox.py", line 125, in iteritems
    value = self[key]
  File "/usr/lib/python3.8/mailbox.py", line 73, in __getitem__
    return self.get_message(key)
  File "/usr/lib/python3.8/mailbox.py", line 781, in get_message
    msg.set_from(from_line[5:].decode('ascii'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 37: ordinal not in range(128)
Exit code:   1 

IMHO it is a bug if those mailboxes can't be read.  Am I missing
something?

Kind regards

       Andreas.

-- 
http://fam-tille.de
#!/bin/sh

wget https://alioth-lists.debian.net/pipermail/pkg-java-maintainers/2020-May.txt.gz
gunzip 2020-May.txt.gz

python3 test_mbox.py
#!/usr/bin/python3

import mailbox

mbox_file = mailbox.mbox('2020-May.txt')
if mbox_file.items() != []:
    print("OK")

Reply to: