Re: Problem making large .deb files
On Mon, 2016-05-23 at 12:22:47 +0200, Guillem Jover wrote:
> Yesterday I had an inspiration for some crazy proposal related to the
> PAX stuff. :) We could switch from an ar container to an uncompressed
> PAX container, which has no limits. To preserve backward compatibility
> at least when it comes to detecting that this is a .deb format, we could
> use the first PAX header name field to store the ar magic and first ar
> header and contents. Because the PAX header's filename is supposed to
> be ignore anyway for archivers supporting the PAX format, but that might
> be used as the extracted filename for ones that do not.
> The nice thing is that the ustar header has a name field which is 100
> chars long, and the ar magic + entire header is 68 chars long, which
> both start at offset 0, and both are ASCII. Of course this might actually
> confuse file detectors and archiving tools quite a bit, but seems like an
> interesting hack. :)
> (This reminded me of the multiple executable formats in the same file
> hacks. :)
And as a PoC, here's some crazyness:
$ mkdir -p meta fsys/dir
$ touch meta/control
$ touch fsys/dir/test
$ tar --create --xz --file meta.tar.xz -C meta .
$ tar --create --xz --file fsys.tar.xz -C fsys .
debian-binary 1444848908 0 0 100644 4 \`
$ tar --create --file test-v3.deb --blocking-factor=4 --format=posix \
--pax-option globexthdr.name="$magic",dpkg.format=3.0 \
$ file test-v3.deb
test-v3.deb: POSIX tar archive
$ file -e tar test-v3.deb
test-v3.deb: Debian binary package (format 3.0)
$ dpkg-deb -I test-v3.deb
dpkg-deb: error: archive is format version 3.0; get a newer dpkg-deb
So, one problem is that by default file(1) does get confused. Another
problem is that PAX is still (AFAIK) not a very widespread format (?),
it's pretty standard though. :) Concocting the above should also be
possible on conformant systems with a pax(1) utility.
Another issue is that due to the additional padding and extra tar header
entries needed by the PAX format, a farily empty package becomes way
fatter. For an archive with 2 members we need 1 global extended header,
2 file extended headers, 2 normal headers, plus the data blocks for each
extended header, and the data for each file member rounded to the nearest
512 byte block, plus 2 extra zero-filled blocks at the end; that makes a
minimum of at least 6 KiB.
So a minimal PAX archive would weight around 6 KiB, while a minimal format
2.0 weights less than 200 bytes. This could be relevant for udebs, but we
could make dpkg-deb only use format 3.0 when needed, or udebs could be
forced to use --deb-format=2.0. In any case this seems insignificant for
the overall archive, let's say with
60k packages / arch * 12 arches * 6 KiB / package =~ 4.2 MiB
of total overhead.
I've updated the <https://wiki.debian.org/Teams/Dpkg/TimeTravelFixes>
page with the new proposals, and will add a link to this post.