Bug#742885: python-apt: Does not support LFS .deb packages on 32-bit systems

To: Guillem Jover <guillem@debian.org>, 742885@bugs.debian.org
Subject: Bug#742885: python-apt: Does not support LFS .deb packages on 32-bit systems
From: Julian Andres Klode <jak@debian.org>
Date: Thu, 3 Apr 2014 14:01:08 +0200
Message-id: <[🔎] 20140403132736.GA17962@debian.org>
Mail-followup-to: Julian Andres Klode <jak@debian.org>, Guillem Jover <guillem@debian.org>, 742885@bugs.debian.org
Reply-to: Julian Andres Klode <jak@debian.org>, 742885@bugs.debian.org
In-reply-to: <20140328140223.GB31745@gaara.hadrons.org>
References: <20140328140223.GB31745@gaara.hadrons.org>

On Fri, Mar 28, 2014 at 03:02:23PM +0100, Guillem Jover wrote:
> Package: python-apt
> Version: 0.9.3.4
> Severity: normal
> Control: block -1 by 742882
> 
> Hi!
> 
> python-apt also seems to have issues with LFS .deb files. Some cannot be
> fixed until apt provides the fixed interfaces though, here's the issues
> I've spotted by code staring, but there might be others:

Short summary: While support for archives larger than 4 GiB will be added,
there will be no support for reading members larger than 4 GiB into
memory (the extractdata stuff) on 32 bit systems.

> 
>  - arfile.cc:ararchive_extractdata() will try to slurp the entire ar
>    member into memory, on 32-bit systems for members > 4 GiB they will
>    not fit in the address space (not sure how this method is being used
>    though).

The whole point of it is to read the entire member into memory. It's
practically only used for debian-binary. So I think we should only add
something like this:

--- a/python/arfile.cc
+++ b/python/arfile.cc
@@ -179,6 +179,10 @@ static PyObject *ararchive_extractdata(PyArArchiveObject *self, PyObject *args)
         PyErr_Format(PyExc_LookupError,"No member named '%s'",name.path);
         return 0;
     }
+    if (member->Size > SIZE_MAX) {
+        PyErr_Format(PyExc_LookupError,"Member '%s' is too large to read into memory",name.path);
+        return 0;
+    }
     if (!self->Fd.Seek(member->Start))
         return HandleErrors();

Although perhaps another exception might fit better.  


>  - arfile.cc:_extract() uses short size and read variable, will truncate
>    on assignment and on cast.

We can fix this.

diff --git a/python/arfile.cc b/python/arfile.cc
index 3284ff7..0d51545 100644
--- a/python/arfile.cc
+++ b/python/arfile.cc
@@ -221,14 +221,14 @@ static PyObject *_extract(FileFd &Fd, const ARArchive::Member *member,
     // Read 4 KiB from the file, until all of the file is read. Deallocated
     // automatically when the function returns.
     SPtrArray<char> value = new char[4096];
-    unsigned long size = member->Size;
-    unsigned long read = 4096;
+    unsigned long long size = member->Size;
+    unsigned long long read = 4096;
     while (size > 0) {
         if (size < read)
             read = size;
         if (!Fd.Read(value, read, true))
             return HandleErrors();
-        if (write(outfd, value, read) != (signed)read)
+        if (write(outfd, value, read) != (signed long long)read)
             return PyErr_SetFromErrnoWithFilename(PyExc_OSError, outfile);
         size -= read;
     }


(The changes to read are practically unneeded, because read will always be
 <= 4096, but are better for consistency)

>  - tarfile.cc:PyDirStream::Process() Size and Pos arguments.
>  - tarfile.cc:PyDirStream::DoItem() allocates the size of the member,
>    which might not fit in the address space on 32-bit systems.

Similar to extractdata in ArArchive. I'll probably include the
following fix, which simply returns None if we cannot read the
data into memory. This will start working with the next APT
ABI break then. It needs some adjustments for the case that
member != NULL, though, as we need to raise an exception in
that case (this is the extractdata() case like in ArArchive,
so we should throw an exception for consistency).

diff --git a/python/tarfile.cc b/python/tarfile.cc
index 7f65a5c..6a1858a 100644
--- a/python/tarfile.cc
+++ b/python/tarfile.cc
@@ -71,6 +71,13 @@ public:
 bool PyDirStream::DoItem(Item &Itm, int &Fd)
 {
     if (!member || strcmp(Itm.Name, member) == 0) {
+        // If the item is larger than we can read, we will not read it,
+        // but rather return None to Python.
+        if (Itm.Size > SIZE_MAX) {
+            delete[] copy;
+            copy = NULL;
+            copy_size = 0;
+        }
         // Allocate a new buffer if the old one is too small.
         if (copy == NULL || copy_size < Itm.Size) {
             delete[] copy;
@@ -87,7 +94,8 @@ bool PyDirStream::DoItem(Item &Itm, int &Fd)
 bool PyDirStream::Process(Item &Itm,const unsigned char *Data,
                           unsigned long Size,unsigned long Pos)
 {
-    memcpy(copy + Pos, Data,Size);
+    if (copy != NULL)
+        memcpy(copy + Pos, Data,Size);
     return true;
 }
 
@@ -98,7 +106,13 @@ bool PyDirStream::FinishedFile(Item &Itm,int Fd)
         return true;
 
     Py_XDECREF(py_data);
-    py_data = PyBytes_FromStringAndSize(copy, Itm.Size);
+    if (copy != NULL) {
+        py_data = PyBytes_FromStringAndSize(copy, Itm.Size);
+    } else {
+        // The item was to large to load into memory, so pass None.
+        Py_INCREF(Py_None);
+        py_data = Py_None;
+    }
 
     if (!callback)
         return true;

-- 
Julian Andres Klode  - Debian Developer, Ubuntu Member

See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.

Please do not top-post if possible.

Reply to:

Prev by Date: Bug#743464: After this operation, N MB would have been used. [-s, --simulate, --just-print, --dry-run, --recon, --no-act]
Next by Date: Bug#743595: apt: Please consider adding information vandor for blankon
Previous by thread: Bug#743464: After this operation, N MB would have been used. [-s, --simulate, --just-print, --dry-run, --recon, --no-act]
Next by thread: Bug#743595: apt: Please consider adding information vandor for blankon
Index(es):
- Date
- Thread