[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: The archive now supports xz compression



On Thu, Aug 11, 2011 at 05:12:36PM +0200, Ansgar Burchardt wrote:
> Hi,
> 
> The archive software now accepts packages using xz for compression in
> addition to gzip and bzip2 for both source and binary packages.

Hurray!

> please only use xz (or bzip2 for that matter) if your
> package really profits from its usage (for example, it provides a
> significant space saving). While those methods may compress better they
> often use more CPU time to do so and a very small decrease in package
> size is hardly worth the extra effort placed on slower systems.

This is very bad advice.

Do you remember my joke package "goodbye" less than two weeks ago?  If you
compared the optimized[1] debhelper-less dpkg-less code to standard "slow"
debhelper, you lose 2.6 seconds per package[2].

In that time you can compress 8MB (and decompress that in 0.09s).

Thus, if you care about amounts of CPU time that small, you can as well go
through the whole archive replacing debhelper with "goodbye", for all
packages smaller than that 8MB.

The cost is roughly linear, too -- so compressing a 20KB package costs about
nothing.

And what do we gain in return?  A massive decrease of the archive size --
both disk space and bandwidth.  Regular packages compress twice as much with
xz as with gzip, with uncompressible data in the mix the average was 2/3 for
amd64 CD1.


Thus, I'd strongly recommend just compressing everything with xz, on all
architectures.  Preferably, as a default in dpkg-dev.


> Think of both user systems and the Debian buildds which will waste more
> time - an especially bad problem on slower architectures.

The gain is especially meaningful for slower architectures, as they tend to
have less disk space and slower network links (arm tends to be used in
phones).  No extra memory is needed -- decompression is not done in parallel
with memory-hungry stages of dpkg's work.  The decompression, merely 2.5
times slower than with gzip, is a tiny fraction of what dpkg takes.


> Please remember that packages in the base system[1] (and dependencies)
> *must* currently use gzip as otherwise debootstrap will be unable to
> install a system.
>
> 1: Meaning everything with Priority: required.

I'm not as strongly opinionated here, but I guess decreasing the size of d-i
images would be a huge win as well.



Meow!

[1]. Three calls of system() could be avoided for little effort and two more
for substantial effort -- and all of them could be turned into execve(), but
otherwise, it's pretty much as fast as you can get.  With package build time
below a single disk seek, further optimizations are pointless.  The point of
"goodbye" was abuse of policy and buzzword compliancy, not speed.

[2]. Debhelper costs are surprisingly constant, even between a "copy a
single file" package and full-blown autoconf+C ones.

-- 
1KB		// Yo momma uses IPv4!


Reply to: