[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#932491: python3-apt: segfault reading from lzma stream



Control: severity -1 important

Hi,

David Bremner <bremner@debian.org> (2019-07-19):
> The following script segfaults if python3-apt is installed, but
> completes if not. Replacing lzma.open with open (and replacing
> Sources.xz with Sources) also makes the segfault go away.  It seems to
> be the same with python3-apt 1.8.4. I didn't check the python2 version
> because lzma is (afaik) python3 only.
> 
> #!/usr/bin/python3
> from debian.deb822 import Sources
> import lzma
> 
> with lzma.open('Sources.xz', mode='rb') as f:
>     for src in Sources.iter_paragraphs(f):
>         package_name = src.get('Package')
>         version = src.get('Version')

This isn't my first attempt at dealing with .xz files using python3-apt,
and I've never managed to get something to work without resorting to
temporary, uncompressed files…

Initial code was:

    import gzip
    with gzip.open('Packages.gz') as f:
        tf = apt_pkg.TagFile(f)
        for stanza in tf:
            do_something_with(stanza)

which should be replaceable with the following given the documentation
of all relevant modules:

    import lzma
    with lzma.open('Packages.xz') as f:
        tf = apt_pkg.TagFile(f)
        for stanza in tf:
            do_something_with(stanza)

Using lzma.LZMAFile(), toying with text vs. binary mode, encoding, bytes
flag, etc. didn't help…


Today I had a few more minutes to spend on this, so here's a little
debugging session. My main system is still bullseye, but the same tests
in a bookworm chroots fail the same way.

Depending on the input data, I'm seeing various expressions of the same
bug, some include a SIGSEGV, some don't.

Here's some sample data:

    # Real files, SIGSEGV (archived suite == those files won't
    # change over time, other indices would do just fine):
    wget http://archive.debian.org/debian/dists/stretch/main/binary-amd64/Packages.gz
    wget http://archive.debian.org/debian/dists/stretch/main/binary-amd64/Packages.xz

    # Smaller stanzas, different errors
    printf "Key1: Short1\nKey2: Short2\n\nKey3: SlightlyLonger1\nKey4: SlightlyLonger2\n\n" > Test
    gzip -k -f Test
    xz -k -f Test

Trying to understand why the lzma case was failing, I tried digging into
apt_pkg.TagFile's internal data, leading to the bug-932491-a.py test
case you'll find attached.

Running it against the Test{.gz,.xz} pair gives:

    $ ./bug-932491-a.py Test
    gz == xz: True
    gz: section 1 size: 26
    gz: section 1 keys: ['Key1', 'Key2']
    gz: section 2 size: 44
    gz: section 2 keys: ['Key3', 'Key4']
    Traceback (most recent call last):
      File "/path/to/bug-932491-a.py", line 33, in <module>
        tf_xz.step()
    apt_pkg.Error: E:Unable to parse package file  (1)

Running it against the Packages{.gz,.xz} pair gives:

    $ ./bug-932491-a.py Packages
    gz == xz: True
    gz: section 1 size: 1281
    gz: section 1 keys: ['Package', 'Version', 'Installed-Size', 'Maintainer', 'Architecture', 'Depends', 'Pre-Depends', 'Description', 'Homepage', 'Description-md5', 'Tag', 'Section', 'Priority', 'Filename', 'Size', 'MD5sum', 'SHA256']
    gz: section 2 size: 585
    gz: section 2 keys: ['Package', 'Version', 'Installed-Size', 'Maintainer', 'Architecture', 'Pre-Depends', 'Suggests', 'Description', 'Homepage', 'Description-md5', 'Tag', 'Section', 'Priority', 'Filename', 'Size', 'MD5sum', 'SHA256']
    xz: section 1 size: 163530
    Segmentation fault

See how crazy the size of the first section is…

The stacktrace can be huge, and this should be easily reproducible so
I'm not attaching anything else, but here's where things explode:

    Program received signal SIGSEGV, Segmentation fault.
    TagSecKeys (Self=<apt_pkg.TagSection at remote 0xb94980>, Args=Args@entry=()) at python/tag.cc:284
    284	      Py_DECREF(Obj);
    (gdb) l
    279	      const char *End = Start;
    280	      for (; End < Stop && *End != ':'; End++);
    281	
    282	      PyObject *Obj;
    283	      PyList_Append(List,Obj = PyString_FromStringAndSize(Start,End-Start));
    284	      Py_DECREF(Obj);
    285	   }
    286	   return List;
    287	}
    288	
    (gdb) p List
    $1 = []
    (gdb) p Obj
    $2 = 0x0


I was mentioning different expressions… Let's see what happens with the
approach I was starting from, using a for loop on the TagFile object,
against the Packages{.gz,.xz} pair again. The bug-932491-b.py test case
implements a demo using gzip then lzma, printing a dot for each
iteration, showing that the lzma problem shows up on the very first
iteration:

    $ ./bin/bug-932491-b.py Packages
    gz packages: 50771
    .Traceback (most recent call last):
      File "/path/to/bug-932491-b.py", line 27, in <module>
        xz_packages.append(stanza['Package'])
                           ~~~~~~^^^^^^^^^^^
    KeyError: 'Package'

Since we're only getting xz files for some suites already, it would be
best if they would be manageable through python3-apt…


Cheers,
-- 
Cyril Brulebois (kibi@debian.org)            <https://debamax.com/>
D-I release manager -- Release team member -- Freelance Consultant
#!/usr/bin/python3
"""
Test case for #932491, version a
"""
import gzip
import lzma
import sys

import apt_pkg

root = sys.argv[1]

# Check data decompression works fine:
with gzip.open(f'{root}.gz') as gz:
    gz_text = gz.read()
with lzma.open(f'{root}.xz') as xz:
    xz_text = xz.read()
print(f'gz == xz: {gz_text == xz_text}')

# Perform 2 manual steps with gz:
with gzip.open(f'{root}.gz') as gz:
    tf_gz = apt_pkg.TagFile(gz)
    tf_gz.step()
    print(f'gz: section 1 size: {tf_gz.section.bytes()}')
    print(f'gz: section 1 keys: {tf_gz.section.keys()}')
    tf_gz.step()
    print(f'gz: section 2 size: {tf_gz.section.bytes()}')
    print(f'gz: section 2 keys: {tf_gz.section.keys()}')

# Perform 2 manual steps with xz:
with lzma.open(f'{root}.xz') as xz:
    tf_xz = apt_pkg.TagFile(xz)
    tf_xz.step()
    print(f'xz: section 1 size: {tf_xz.section.bytes()}')
    print(f'xz: section 1 keys: {tf_xz.section.keys()}')
    tf_xz.step()
    print(f'xz: section 2 size: {tf_xz.section.bytes()}')
    print(f'xz: section 2 keys: {tf_xz.section.keys()}')
#!/usr/bin/python3
"""
Test case for #932491: version b
"""
import gzip
import lzma
import sys

import apt_pkg

root = sys.argv[1]

# Start a loop:
gz_packages = []
with gzip.open(f'{root}.gz') as gz:
    tf_gz = apt_pkg.TagFile(gz)
    for stanza in tf_gz:
        gz_packages.append(stanza['Package'])
print(f'gz packages: {len(gz_packages)}')

# Start a loop:
xz_packages = []
with lzma.open(f'{root}.xz') as xz:
    tf_xz = apt_pkg.TagFile(xz)
    for stanza in tf_xz:
        print('.', end='')
        xz_packages.append(stanza['Package'])
print()
print(f'xz packages: {len(xz_packages)}')

Attachment: signature.asc
Description: PGP signature


Reply to: