Control: severity -1 important Hi, David Bremner <bremner@debian.org> (2019-07-19): > The following script segfaults if python3-apt is installed, but > completes if not. Replacing lzma.open with open (and replacing > Sources.xz with Sources) also makes the segfault go away. It seems to > be the same with python3-apt 1.8.4. I didn't check the python2 version > because lzma is (afaik) python3 only. > > #!/usr/bin/python3 > from debian.deb822 import Sources > import lzma > > with lzma.open('Sources.xz', mode='rb') as f: > for src in Sources.iter_paragraphs(f): > package_name = src.get('Package') > version = src.get('Version') This isn't my first attempt at dealing with .xz files using python3-apt, and I've never managed to get something to work without resorting to temporary, uncompressed files… Initial code was: import gzip with gzip.open('Packages.gz') as f: tf = apt_pkg.TagFile(f) for stanza in tf: do_something_with(stanza) which should be replaceable with the following given the documentation of all relevant modules: import lzma with lzma.open('Packages.xz') as f: tf = apt_pkg.TagFile(f) for stanza in tf: do_something_with(stanza) Using lzma.LZMAFile(), toying with text vs. binary mode, encoding, bytes flag, etc. didn't help… Today I had a few more minutes to spend on this, so here's a little debugging session. My main system is still bullseye, but the same tests in a bookworm chroots fail the same way. Depending on the input data, I'm seeing various expressions of the same bug, some include a SIGSEGV, some don't. Here's some sample data: # Real files, SIGSEGV (archived suite == those files won't # change over time, other indices would do just fine): wget http://archive.debian.org/debian/dists/stretch/main/binary-amd64/Packages.gz wget http://archive.debian.org/debian/dists/stretch/main/binary-amd64/Packages.xz # Smaller stanzas, different errors printf "Key1: Short1\nKey2: Short2\n\nKey3: SlightlyLonger1\nKey4: SlightlyLonger2\n\n" > Test gzip -k -f Test xz -k -f Test Trying to understand why the lzma case was failing, I tried digging into apt_pkg.TagFile's internal data, leading to the bug-932491-a.py test case you'll find attached. Running it against the Test{.gz,.xz} pair gives: $ ./bug-932491-a.py Test gz == xz: True gz: section 1 size: 26 gz: section 1 keys: ['Key1', 'Key2'] gz: section 2 size: 44 gz: section 2 keys: ['Key3', 'Key4'] Traceback (most recent call last): File "/path/to/bug-932491-a.py", line 33, in <module> tf_xz.step() apt_pkg.Error: E:Unable to parse package file (1) Running it against the Packages{.gz,.xz} pair gives: $ ./bug-932491-a.py Packages gz == xz: True gz: section 1 size: 1281 gz: section 1 keys: ['Package', 'Version', 'Installed-Size', 'Maintainer', 'Architecture', 'Depends', 'Pre-Depends', 'Description', 'Homepage', 'Description-md5', 'Tag', 'Section', 'Priority', 'Filename', 'Size', 'MD5sum', 'SHA256'] gz: section 2 size: 585 gz: section 2 keys: ['Package', 'Version', 'Installed-Size', 'Maintainer', 'Architecture', 'Pre-Depends', 'Suggests', 'Description', 'Homepage', 'Description-md5', 'Tag', 'Section', 'Priority', 'Filename', 'Size', 'MD5sum', 'SHA256'] xz: section 1 size: 163530 Segmentation fault See how crazy the size of the first section is… The stacktrace can be huge, and this should be easily reproducible so I'm not attaching anything else, but here's where things explode: Program received signal SIGSEGV, Segmentation fault. TagSecKeys (Self=<apt_pkg.TagSection at remote 0xb94980>, Args=Args@entry=()) at python/tag.cc:284 284 Py_DECREF(Obj); (gdb) l 279 const char *End = Start; 280 for (; End < Stop && *End != ':'; End++); 281 282 PyObject *Obj; 283 PyList_Append(List,Obj = PyString_FromStringAndSize(Start,End-Start)); 284 Py_DECREF(Obj); 285 } 286 return List; 287 } 288 (gdb) p List $1 = [] (gdb) p Obj $2 = 0x0 I was mentioning different expressions… Let's see what happens with the approach I was starting from, using a for loop on the TagFile object, against the Packages{.gz,.xz} pair again. The bug-932491-b.py test case implements a demo using gzip then lzma, printing a dot for each iteration, showing that the lzma problem shows up on the very first iteration: $ ./bin/bug-932491-b.py Packages gz packages: 50771 .Traceback (most recent call last): File "/path/to/bug-932491-b.py", line 27, in <module> xz_packages.append(stanza['Package']) ~~~~~~^^^^^^^^^^^ KeyError: 'Package' Since we're only getting xz files for some suites already, it would be best if they would be manageable through python3-apt… Cheers, -- Cyril Brulebois (kibi@debian.org) <https://debamax.com/> D-I release manager -- Release team member -- Freelance Consultant
#!/usr/bin/python3 """ Test case for #932491, version a """ import gzip import lzma import sys import apt_pkg root = sys.argv[1] # Check data decompression works fine: with gzip.open(f'{root}.gz') as gz: gz_text = gz.read() with lzma.open(f'{root}.xz') as xz: xz_text = xz.read() print(f'gz == xz: {gz_text == xz_text}') # Perform 2 manual steps with gz: with gzip.open(f'{root}.gz') as gz: tf_gz = apt_pkg.TagFile(gz) tf_gz.step() print(f'gz: section 1 size: {tf_gz.section.bytes()}') print(f'gz: section 1 keys: {tf_gz.section.keys()}') tf_gz.step() print(f'gz: section 2 size: {tf_gz.section.bytes()}') print(f'gz: section 2 keys: {tf_gz.section.keys()}') # Perform 2 manual steps with xz: with lzma.open(f'{root}.xz') as xz: tf_xz = apt_pkg.TagFile(xz) tf_xz.step() print(f'xz: section 1 size: {tf_xz.section.bytes()}') print(f'xz: section 1 keys: {tf_xz.section.keys()}') tf_xz.step() print(f'xz: section 2 size: {tf_xz.section.bytes()}') print(f'xz: section 2 keys: {tf_xz.section.keys()}')
#!/usr/bin/python3 """ Test case for #932491: version b """ import gzip import lzma import sys import apt_pkg root = sys.argv[1] # Start a loop: gz_packages = [] with gzip.open(f'{root}.gz') as gz: tf_gz = apt_pkg.TagFile(gz) for stanza in tf_gz: gz_packages.append(stanza['Package']) print(f'gz packages: {len(gz_packages)}') # Start a loop: xz_packages = [] with lzma.open(f'{root}.xz') as xz: tf_xz = apt_pkg.TagFile(xz) for stanza in tf_xz: print('.', end='') xz_packages.append(stanza['Package']) print() print(f'xz packages: {len(xz_packages)}')
Attachment:
signature.asc
Description: PGP signature