[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#872334: apt-utils: apt-ftparchive does not correctly cache filesizes for packages > 4GB



Package: apt-utils
Version: 1.4.7
Severity: normal
Tags: patch lfs

[ This bug was originally found in and reported to Ubuntu, but appears to
affect Debian as well.  Please see also:
https://bugs.launchpad.net/ubuntu/+source/apt/+bug/1710911 ]

apt-ftparchive is a utility for, among other things, generating a Packages file
from a set of .deb packages.

Because generating Packages files for a large directory tree of .deb packages
is expensive, it can cache the properties of .deb packages it has already
inspected in a Berkeley database file.

Historically, apt-ftparchive stored the size of a .deb package as a 32-bit
unsigned integer in network byte-order in this cache. This field was later
enlarged to 64-bits - which caused other problems; see: LP #1274466.

However, even after this integer field was enlarged, apt-ftparchive continued
to use the htonl() and ntohl() libc functions to convert file-sizes to and from
network byte-order when reading and writing to its cache. These functions
unconditionally emit 32-bit unsigned integers, which means that apt-ftparchive
remained unable to correctly record the file-sizes for packages > 32bits (i.e.
> 4GB).

Consequently, if apt-ftparchive is asked (with caching enabled) to generate a
Packages file for a .deb package larger than 4GB, it will produce a Packages
file with the correct Size: field the first time, but with incorrect Size:
fields subsequently.

I have developed a small patch which replaces the use of the ntohl() family of
functions with suitable replacements from <endian.h>. This produces correct
output on new installations.

However, caution is necessary: the existing code is incorrectly storing the 32
least significant bits of a 64-bit number in the upper 32-bits of a 64-bit
field, in big-endian byte order. The application of this patch will cause new
values to be stored correctly, but in a binary-incompatible way with existing
caches.

For example, a package of size 7162161474 bytes will today have the following
sequence of bytes stored in its cache entry:

\xaa\xe5\xe9\x42\x00\x00\x00\x00

(When re-read, this will produce a file-size value of 7162161474 mod 32bits,
i.e. 2867194178.)

With this patch applied, apt-ftparchive will correctly store this entry:

\x00\x00\x00\x01\xaa\xe5\xe9\x42

However, this correct cache entry, when interpreted by the current broken code,
will return a file-size of 1 byte. Worse, the existing broken entry will be
interpreted by my fixed code as containing the value 12314505225791602688.

It would be good to have this patch, or some derivative of it, applied to the
main APT code-base. Before this can happen, however, some mechanism to detect
and correct broken cache entries will be needed if we are to avoid a repeat of
LP #1274466.

I had initially thought that this could be done by checking the trailing four
bytes of the 64-bit filesize field; if there were all zeroes, then the cache
entry could be considered broken.  However, this will will misbehave if the
corrected code is asked to record information for a file that is an exact
integer multiple of 4GB. You could stat the file on the filesystem in this case
to resolve the ambiguity.

Alternatively, you could modify the data-structure above to make it unambiguous
what kind of record is being read:

* A flag bit could be used in Flags;
* An explicit version field could be added to the end of this struct.
* The use of the cache to store package file sizes could be discontinued in
favour of always statting the file-size directly;
* The packed data-structure could be discarded completely, in favour of storing
different pieces of metadata in independent DB keys.



-- System Information:
Debian Release: 9.1
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.9.0-3-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8), LANGUAGE=en_GB.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages apt-utils depends on:
ii  apt             1.4.7
ii  libapt-inst2.0  1.4.7
ii  libapt-pkg5.0   1.4.7
ii  libc6           2.24-11+deb9u1
ii  libdb5.3        5.3.28-12+b1
ii  libgcc1         1:6.3.0-18
ii  libstdc++6      6.3.0-18

apt-utils recommends no packages.

apt-utils suggests no packages.

-- no debconf information
Author: David McBride <dwm37@cam.ac.uk>
Date:   Tue Aug 15 15:30:18 2017 +0100

    Fix truncation of 64-bit file-sizes caused by htonl()
    
    The htonl() function is defined as returning uint32_t results.
    This was fine for our purposes when file-sizes were stored in our
    cache as 32-bit unsigned integers - but this was changed long ago
    to 64-bits.  The continued use of htonl() causes only the least
    significant 32-bits of the file size to be stored, which is fine
    for files < 4GB in size - but breaks for values greater than this.
    
    Fix this problem by replaxing the use of htonl() and ntohl() with
    the appropriate functions from <endian.h>.
    
    NOTE: You may need to discard any existing cache database file
          after applying this change, as previous code may not have been
          storing file-size bitsequences correctly.

diff a/ftparchive/cachedb.cc b/ftparchive/cachedb.cc
--- a/ftparchive/cachedb.cc
+++ b/ftparchive/cachedb.cc
@@ -23,7 +23,7 @@
 #include <apt-pkg/gpgv.h>
 #include <apt-pkg/hashes.h>
 
-#include <netinet/in.h>       // htonl, etc
+#include <endian.h>
 #include <ctype.h>
 #include <stddef.h>
 #include <sys/stat.h>
@@ -185,7 +185,7 @@ bool CacheDB::GetFileStat(bool const &doStat)
                            _("Failed to stat %s"),FileName.c_str());
    }
    CurStat.FileSize = St.st_size;
-   CurStat.mtime = htonl(St.st_mtime);
+   CurStat.mtime = htobe32(St.st_mtime);
    CurStat.Flags |= FlSize;
    
    return true;
@@ -263,8 +263,8 @@ bool CacheDB::GetCurStat()
          return _error->Error("Cache record size mismatch (%ul)", Data.size);
       }
 
-      CurStat.Flags = ntohl(CurStat.Flags);
-      CurStat.FileSize = ntohl(CurStat.FileSize);
+      CurStat.Flags = be32toh(CurStat.Flags);
+      CurStat.FileSize = be64toh(CurStat.FileSize);
    }      
    return true;
 }
@@ -524,12 +524,12 @@ bool CacheDB::Finish()
       return true;
 
    // Write the stat information
-   CurStat.Flags = htonl(CurStat.Flags);
-   CurStat.FileSize = htonl(CurStat.FileSize);
+   CurStat.Flags = htobe32(CurStat.Flags);
+   CurStat.FileSize = htobe64(CurStat.FileSize);
    InitQueryStats();
    Put(&CurStat,sizeof(CurStat));
-   CurStat.Flags = ntohl(CurStat.Flags);
-   CurStat.FileSize = ntohl(CurStat.FileSize);
+   CurStat.Flags = be32toh(CurStat.Flags);
+   CurStat.FileSize = be64toh(CurStat.FileSize);
 
    return true;
 }

Reply to: