[proposal] use xz compression for Debian package by default
Hi,
In DebConf12, I talked about xz compression for Debian packages(*).
Now I'll talk about next step, suggestion for use xz with with result
from some experiment.
*) http://penta.debconf.org/dc12_schedule/events/930.en.html
------------------------------------------------------------------------------
test environment (armel)
------------------------------------------------------------------------------
I used Netgear ReadyNAS Duo v2, armel arch machine for this test.
see http://www.netgear.com/home/products/storage/prosumer/rnd2000.aspx
It is based on Debian Squeeze, so the result will be the same in Debian :)
# uname -a
Linux nas-A0-96-88 2.6.31.8.duov2 #1 Mon May 14 18:35:20 HKT 2012 armv5tel GNU/Linux
# cat /proc/cpu
cpu/ cpuinfo
root@nas-A0-96-88:/tmp# cat /proc/cpuinfo
Processor : Feroceon 88FR131 rev 1 (v5l)
BogoMIPS : 1599.07
Features : swp half thumb fastmult edsp
CPU implementer : 0x56
CPU architecture: 5TE
CPU variant : 0x2
CPU part : 0x131
CPU revision : 1
Hardware : Feroceon-KW
Revision : 0000
Serial : 0000000000000000
# free
total used free shared buffers cached
Mem: 246820 139216 107604 0 2760 57460
-/+ buffers/cache: 78996 167824
Swap: 524268 360 523908
And I used libreoffice-core package (about 35MB) for the test, now it uses bz2
for package compression, and openclipart-png (old version, about 600MB).
------------------------------------------------------------------------------
results1 (libreoffice-core)
------------------------------------------------------------------------------
Okay? Here we go...
# du -m *
1 control.tar.gz
35 data.tar.bz2
38 data.tar.gz
24 data.tar.xz
1 debian-binary
35 libreoffice-core_3.5.4-7_armel.deb
# time gzip -d data.tar.gz
real 0m7.253s
user 0m4.980s
sys 0m1.070s
# time bzip2 -dfk data.tar.bz2
real 0m45.256s
user 0m42.320s
sys 0m2.000s
# time xz -dfk data.tar.xz
real 0m11.443s
user 0m9.710s
sys 0m1.450s
size decomp-time
without compression : 141MB -
Default compression(gzip -9) : 38MB 7.3s
Package option (bzip2 -9) : 35MB 45.3s
xz (--arm --check=crc32 --lzma2=dict=64KiB) : 24MB 11.4s
(--arm --check=crc32 --lzma2=dict=1MiB) : 22MB 11.0s
(--arm --lzma2=dict=64KiB) : 24MB 12.5s
(--arm --lzma2=dict=1MiB) : 22MB 12.0s
(--lzma2=dict=64KiB) : 27MB 12.8s
(--lzma2=dict=1MiB) : 25MB 12.3s
------------------------------------------------------------------------------
results2 (openclipart-png, it's arch:all and huge package)
------------------------------------------------------------------------------
# du -m *
(snip)
# time gzip -d data.tar.gz
# time bzip2 -dfk data.tar.bz2
# time xz -dfk data.tar.xz
(snip)
size decomp-time
without compression : 632MB -
Default compression(gzip -9) : 607MB 48.7s
bzip2 compression (bzip2 -9) : 611MB 6m52s
xz (--check=crc32 --lzma2=dict=64KiB) : 604MB 2m09s
(--check=crc32 --lzma2=dict=1MiB) : 601MB 2m12s
(--lzma2=dict=64KiB) : 604MB 2m12s
(--lzma2=dict=1MiB) : 601MB 2m11s
------------------------------------------------------------------------------
results3 (libreoffice-core by amd64 machine)
------------------------------------------------------------------------------
armel vs Intel Corei3 2.90MHz -> almost x5 than armel. size is 10% large.
size decomp-time
xz (--x86 --lzma2=dict=1MiB) : 25MB 2.7s
------------------------------------------------------------------------------
conclusion (half)
------------------------------------------------------------------------------
We should use xz compression instead of bzip2 at least. bzip is harmful for
compressing debian package, so should drop it from support to check easier.
Using xz is
- smaller than gz and bz2, able to be cut 1/3 size
- faster than bz2 and not much slower than gz (on armel arch, at least)
1.5 times slower than gzip
gzip or xz?
- cut 1/3 size = cut download time/traffic and repository size
- slower 1.5 times = it takes more extract time when package is installed
-> average download rate = almost 600KB/s
-> download 35MB = 60 sec
24MB = 40 sec -> diff = 20 sec
+ 4 sec - 20 sec = -16 sec (if you use xz)
------------------------------------------------------------------------------
conclusion (rest)
------------------------------------------------------------------------------
I recommend to use xz ***by default*** (with appropriate option) on not only
i386/amd64 but on ANY architectures. Increasing extract time can be ignore by
decreasing download time and its only part of installation as Mike Hommey
suggested "I/O is still more time consuming than CPU", and nothing worse than
high cpu usage.
We know some packages are better to use gzip, but it's an exception. Using xz
is best choice for rest 99.99% of packages. We can deal with such exception
by specifying gzip for that (e.g. openclipart-png).
*** what's the best compress option for default? ***
low CPU : --check=crc32 -> -10% time
low memory : --lzma2=dict=64KiB (or -0) -> use 100KiB mem
average CPU/memory : --lzma2=dict=8MiB (= -6 = default)
use arch optimization? : Yes, if we can (*) -> -10% size
*** how to find appropriate compression rate(1, 6 or 9) for xz? ***
build your package with each option :-)
I've proposed tiny hack for debhelper, with specifying environment variable,
it creates each compression option - gz, 1, 6, 9, 1e, 6e and 9e.
See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=686048
------------------------------------------------------------
*) tiny pseudo code
arch=`dpkg-architecture -qDEB_HOST_ARCH`
if [ arch = arm | armel | armhf | aarch64 ] // maybe
set on_arch --arm
elsif [ arch = powerpc | ppc64 | powerpcspe ] // maybe
set on_arch --powerpc
elsif [ arch = sparc | sparc64 ] // maybe
set on_arch --sparc
elsif [ arch = ia64 ]
set on_arch --ia64
elsif [ arch = i386 | amd64 ]
set --x86
fi
--
Regards,
Hideki Yamane henrich @ debian.or.jp/org
http://wiki.debian.org/HidekiYamane
Reply to: