Re: .deb format: let's use 0.939, zstd, drop bzip2
On Wed, 2019-05-08 at 19:38:26 +0200, Adam Borowski wrote:
> First, the 0.939 format, as described in "man deb-old". While still being
> accepted by dpkg, it had been superseded before even the very first stable
> release. Why? It has at least two upsides over 2.0:
I'll try to detangle the discussion and address this first. Some of
what I'm going to write has already been writen in the thread, but
I'm just going to condense and give it some additional context and
lay down the direction I'd like to go with.
To recap, format 0.93x has multiple problems:
- Cannot be handled with stock tools.
- Not easily extensible.
- Bad data alignment.
- Bad commpression support.
- Bad tool coverage (see below).
I don't think it's correct that most tools support that format, from the
list of programs that I've tracked that handle .deb directly, I'd even
say almost none do <https://wiki.debian.org/Teams/Dpkg/DebSupport>. This
list does not include many projects/programs not within Debian handling
.deb archives directly.
The size limit is indeed a problem, and was already known and tracked
in deb(5) and <https://wiki.debian.org/Teams/Dpkg/TimeTravelFixes>, see
the “.deb size limit” item there, and then later discussed in
while I think the workarounds I listed there are probably still valid
in most cases, if this is affecting people then prioritizing fixing it
now would be good.
The crazy idea I came up with at the time was to use a dual-format PAX+ar
container (that would embed the ar(5) header in the first PAX name entry).
This would make old tools at least detect this is a .deb package, with a
higher major version.
But I guess I was never sold on it either, and thinking about it, the
tradeoff does not really look very good. file(1) does not even recognize
it out-of-the-box as a .deb anyway, and we'd just get a nicer error
message from some of the tools handling .debs, but all of them need to
be updated anyway to support any new format. It also destroys some of the
nice properties of the 2.x format, namely:
- Not requiring special tools to build/extract.
- Using a non-widespread format (PAX).
Getting rid of ar(5) also would make the format more portable, as the
ar(5) format does change depending on the Unix system! Even besides the
main common format and its BSD and GNU variants, there are other
wildly different layouts. It would also mean we do not need binutils
to analyze them when there is no dpkg-deb around.
For the same reason using PAX would probably be a bad idea, as it's a
format that has unfortunately not really caught up, and takes more
space due to the additional headers, and we do not really need xattr
in the contains. I went for that for its unlimited length metadata, but
since dpkg 1.18.24 that should not be an issue as I implemented GNU
large file metadata support which means we have pretty much "unlimited"
length metadata, and I'd say its encoding is more widespread than PAX
(for example star supports it).
So I think Andrej is on the spot, and we should just switch from ar(5)
to tar(5) as the container, but not to PAX, just the GNU extensions we
already support, which would only be used when necessary. And ignore
any crazy idea of embedding an ar header inside the first member, as
that will just complicate matters and be cruft once we have switched.
So given that we'd need to modify any program handling .debs directly
anyway, I'd go for the most straightforward and simple of the options.
I'll propose an actual diff I've got here of deb(5) tomorrow, but
otherwise if there are no great concerns, I'd like to start adding
support for this for dpkg 1.20.x.