On Sun, Aug 14, 2011 at 09:19:02PM +0200, Lucas Nussbaum wrote: > On 11/08/11 at 19:52 +0000, Philipp Kern wrote: > > On 2011-08-11, Adam Borowski <firstname.lastname@example.org> wrote: > > >> Think of both user systems and the Debian buildds which will waste more > > >> time - an especially bad problem on slower architectures. > > > The gain is especially meaningful for slower architectures, as they tend to > > > have less disk space and slower network links (arm tends to be used in > > > phones). No extra memory is needed -- decompression is not done in parallel > > > with memory-hungry stages of dpkg's work. The decompression, merely 2.5 > > > times slower than with gzip, is a tiny fraction of what dpkg takes. > > > > It takes a lot longer to compress on slower architectures (i.e. on the > > buildds), though. You could've built a whole package in that time. (Resorting > > to your style of argument.) > > Wouldn't it be better to get more buildds for those archs, then? > That would be a totally appropriate use of Debian money... While having more buildds is always nice, it doesn't seem to me that they would be necessary just to switch to xz. I gathered some data: * A repack of the whole archive (amd64+all main, ~40GB) took close to three hours on a 6xPhenomII 2.8GHz box (ar p|gzip/bzip2 -d|xz). Does someone have an estimate how many core-hours would an archive rebuild on such a machine take? Folks on IRC quoted numbers like "340", "240 on a very fast box", "more like 1500" -- too divergent for my liking. The first number, 340, would mean switching to xz exclusively would increase average build time by ~5%. * armel Cortex-A8 600MHz does xz consistently 12.1 times slower than one core of the above box (on a big compressible and a big uncompressible file), that's ~2.6 times slower per-MHz. Glancing at build logs of a few randomly chosen packages, I see armel builds taking respectively 16.9, 13.1, 18.8 and 15.1 times longer. Not sure what are the actual speeds of buildds, but it looks like armel would be affected by less than the above 5%. * A year ago, I repacked CD1, .xz took 66% space needed by .gz. This time, on the whole archive, gains are somewhat smaller: 72%. I guess that CD1 is code-heavy while packages of lower priorities tend to have more data. Raw data: http://angband.pl/tmp/rexz/gzip.gz and http://angband.pl/tmp/rexz/bzip2.gz (these numbers are data.tar.* alone) An empty package is bigger (180%: 36 vs 20 bytes). Packages with sizes <1000 bytes: 85%. Packages with sizes <10000 bytes: 76%. * Compression time seems to be linear for all sizes that can be measured without tricks. * It is possible to repack .deb files after they are built. * Busybox (and thus d-i) can be compiled with xz support. Size-wise it's a clear gain (dpkg.deb saves 883008 bytes, apt.deb 751967 ...). This is the only place where the memory cost could possibly matter, though. * Other distributions that could run debootstrap have all since switched to xz, so it's mandatory there already. A possible concern would be deboostrapping from an outdated install of those. There seems to be a lot of confusion like "do we have any guideline for the sort of space savings which justify using xz?". The d-d-a post in particular seems to suggest only big packages should be switched; my data suggests that switching many small packages is not significantly different from switching a single big one. Thus, I'd say it'd be simpler to just switch everything. . Among major ones, it seems only Gentoo ships non-xz images, but xz is included in their "system set" (our "essential"). -- 1KB // Yo momma uses IPv4!
Description: Digital signature