[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: .deb format: let's use 0.939, zstd, drop bzip2



On Fri, May 10, 2019 at 05:18:18AM +0200, Guillem Jover wrote:
> On Wed, 2019-05-08 at 19:38:26 +0200, Adam Borowski wrote:
> > First, the 0.939 format, as described in "man deb-old".  While still being
> > accepted by dpkg, it had been superseded before even the very first stable
> > release.  Why?  It has at least two upsides over 2.0:

> To recap, format 0.93x has multiple problems:
> 
>   - Cannot be handled with stock tools.

"ar" is an obscure historical thing, akin to "cpio" or such.  It is used
deep within the format, but I wouldn't call this part an upside at all.

>   - Not easily extensible.

Huh?  Seems exactly as extensible as 2.0 (where all deployed extensions went
to control.tar).

>   - Bad data alignment.

Yeah, but it's still faster than 2.0.  And I don't expect decompressors to
care about alignment.  Might matter for "cat", though -- but I don't imagine
many uncompressed archives.  Heck, if we'd start 3.0, I'd recommend lz4
instead.

>   - Bad commpression support.

Trivial to add.

>   - Bad tool coverage (see below).
> 
> I don't think it's correct that most tools support that format, from the
> list of programs that I've tracked that handle .deb directly, I'd even
> say almost none do <https://wiki.debian.org/Teams/Dpkg/DebSupport>.

Most of those have no business looking at the format's details, just the
payload.

> The crazy idea I came up with at the time was to use a dual-format PAX+ar
> container (that would embed the ar(5) header in the first PAX name entry).
> This would make old tools at least detect this is a .deb package, with a
> higher major version.
[...]
> So I think Andrej is on the spot, and we should just switch from ar(5)
> to tar(5) as the container

I would heavily advise against archive-in-archive.  Especially not tar, with
its block madness.  The blocks disappear when compressed but you're not
going to compress the outer layer.  Also, you can't shed the outer layer of
tar without a filter.

According to the benchmarks I just posted, even less than 1/3 loaded
processor is already bottlenecked on passing data from layer to layer.  I
tried a zero-copy implementation with libarchive's callbacks, but it doesn't
seem to help:

gzip, median of 101, libarchive implementation:
0.93: real 0.97 user 14.74
2.0:  real 0.99 user 15.89

> I'll propose an actual diff I've got here of deb(5) tomorrow, but
> otherwise if there are no great concerns, I'd like to start adding
> support for this for dpkg 1.20.x.

Let's not be hasty -- unlike 0.93 which has an existing (if spotty) support,
a complete format break should be better researched.  Ansgar's concerns for
example should be at least considered.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Did ya know that typing "test -j8" instead of "ctest -j8"
⢿⡄⠘⠷⠚⠋⠀ will make your testsuite pass much faster, and fix bugs?
⠈⠳⣄⠀⠀⠀⠀


Reply to: