[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: The archive now supports xz compression



On Sun, Aug 14, 2011 at 09:19:02PM +0200, Lucas Nussbaum wrote:
> On 11/08/11 at 19:52 +0000, Philipp Kern wrote:
> > On 2011-08-11, Adam Borowski <kilobyte@angband.pl> wrote:
> > >> Think of both user systems and the Debian buildds which will waste more
> > >> time - an especially bad problem on slower architectures.
> > > The gain is especially meaningful for slower architectures, as they tend to
> > > have less disk space and slower network links (arm tends to be used in
> > > phones).  No extra memory is needed -- decompression is not done in parallel
> > > with memory-hungry stages of dpkg's work.  The decompression, merely 2.5
> > > times slower than with gzip, is a tiny fraction of what dpkg takes.
> > 
> > It takes a lot longer to compress on slower architectures (i.e. on the
> > buildds), though.  You could've built a whole package in that time.  (Resorting
> > to your style of argument.)
> 
> Wouldn't it be better to get more buildds for those archs, then?
> That would be a totally appropriate use of Debian money...

While having more buildds is always nice, it doesn't seem to me that they
would be necessary just to switch to xz.

I gathered some data:

* A repack of the whole archive (amd64+all main, ~40GB) took close to three
  hours on a 6xPhenomII 2.8GHz box (ar p|gzip/bzip2 -d|xz).

  Does someone have an estimate how many core-hours would an archive rebuild
  on such a machine take?  Folks on IRC quoted numbers like "340", "240 on a
  very fast box", "more like 1500" -- too divergent for my liking.  The
  first number, 340, would mean switching to xz exclusively would increase
  average build time by ~5%.

* armel Cortex-A8 600MHz does xz consistently 12.1 times slower than one
  core of the above box (on a big compressible and a big uncompressible
  file), that's ~2.6 times slower per-MHz.

  Glancing at build logs of a few randomly chosen packages, I see armel
  builds taking respectively 16.9, 13.1, 18.8 and 15.1 times longer.  Not
  sure what are the actual speeds of buildds, but it looks like armel would
  be affected by less than the above 5%.

* A year ago, I repacked CD1, .xz took 66% space needed by .gz.  This time,
  on the whole archive, gains are somewhat smaller: 72%.  I guess that CD1
  is code-heavy while packages of lower priorities tend to have more data.

  Raw data: http://angband.pl/tmp/rexz/gzip.gz and
            http://angband.pl/tmp/rexz/bzip2.gz
  (these numbers are data.tar.* alone)

  An empty package is bigger (180%: 36 vs 20 bytes).
  Packages with sizes <1000 bytes:  85%.
  Packages with sizes <10000 bytes: 76%.

* Compression time seems to be linear for all sizes that can be measured
  without tricks.

* It is possible to repack .deb files after they are built.

* Busybox (and thus d-i) can be compiled with xz support.  Size-wise it's
  a clear gain (dpkg.deb saves 883008 bytes, apt.deb 751967 ...).  This is
  the only place where the memory cost could possibly matter, though.

* Other distributions that could run debootstrap have all since switched to
  xz[1], so it's mandatory there already.  A possible concern would be
  deboostrapping from an outdated install of those.


There seems to be a lot of confusion like "do we have any guideline for the
sort of space savings which justify using xz?".  The d-d-a post in
particular seems to suggest only big packages should be switched; my data
suggests that switching many small packages is not significantly different
from switching a single big one.

Thus, I'd say it'd be simpler to just switch everything.


[1]. Among major ones, it seems only Gentoo ships non-xz images, but xz is
included in their "system set" (our "essential").

-- 
1KB		// Yo momma uses IPv4!

Attachment: signature.asc
Description: Digital signature


Reply to: