[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[proposal] use xz compression for Debian package by default



Hi,

 In DebConf12, I talked about xz compression for Debian packages(*).
 Now I'll talk about next step, suggestion for use xz with with result
 from some experiment.


 *) http://penta.debconf.org/dc12_schedule/events/930.en.html

------------------------------------------------------------------------------
test environment (armel)
------------------------------------------------------------------------------

 I used Netgear ReadyNAS Duo v2, armel arch machine for this test.
 see http://www.netgear.com/home/products/storage/prosumer/rnd2000.aspx
 It is based on Debian Squeeze, so the result will be the same in Debian :)


# uname -a
Linux nas-A0-96-88 2.6.31.8.duov2 #1 Mon May 14 18:35:20 HKT 2012 armv5tel GNU/Linux

# cat /proc/cpu
cpu/     cpuinfo  
root@nas-A0-96-88:/tmp# cat /proc/cpuinfo 
Processor	: Feroceon 88FR131 rev 1 (v5l)
BogoMIPS	: 1599.07
Features	: swp half thumb fastmult edsp 
CPU implementer	: 0x56
CPU architecture: 5TE
CPU variant	: 0x2
CPU part	: 0x131
CPU revision	: 1

Hardware	: Feroceon-KW
Revision	: 0000
Serial		: 0000000000000000

# free 
             total       used       free     shared    buffers     cached
Mem:        246820     139216     107604          0       2760      57460
-/+ buffers/cache:      78996     167824
Swap:       524268        360     523908


 And I used libreoffice-core package (about 35MB) for the test, now it uses bz2
 for package compression, and openclipart-png (old version, about 600MB).


------------------------------------------------------------------------------
results1 (libreoffice-core)
------------------------------------------------------------------------------

 Okay? Here we go...

# du -m *
1	control.tar.gz
35	data.tar.bz2
38	data.tar.gz
24	data.tar.xz
1	debian-binary
35	libreoffice-core_3.5.4-7_armel.deb

# time gzip -d data.tar.gz 

real	0m7.253s
user	0m4.980s
sys	0m1.070s

# time bzip2 -dfk data.tar.bz2 

real	0m45.256s
user	0m42.320s
sys	0m2.000s

# time xz -dfk data.tar.xz 

real	0m11.443s
user	0m9.710s
sys	0m1.450s

                                                       size    decomp-time
 without compression                                 : 141MB    -
 Default compression(gzip -9)                        :  38MB    7.3s
 Package option (bzip2 -9)                           :  35MB   45.3s
 xz (--arm --check=crc32 --lzma2=dict=64KiB)         :  24MB   11.4s
    (--arm --check=crc32 --lzma2=dict=1MiB)          :  22MB   11.0s
    (--arm --lzma2=dict=64KiB)                       :  24MB   12.5s
    (--arm --lzma2=dict=1MiB)                        :  22MB   12.0s
    (--lzma2=dict=64KiB)                             :  27MB   12.8s
    (--lzma2=dict=1MiB)                              :  25MB   12.3s


------------------------------------------------------------------------------
results2 (openclipart-png, it's arch:all and huge package)
------------------------------------------------------------------------------

# du -m *
(snip)

# time gzip -d data.tar.gz 
# time bzip2 -dfk data.tar.bz2 
# time xz -dfk data.tar.xz 
(snip)

                                                       size    decomp-time
 without compression                                 : 632MB    -
 Default compression(gzip -9)                        : 607MB   48.7s
 bzip2 compression (bzip2 -9)                        : 611MB   6m52s
 xz (--check=crc32 --lzma2=dict=64KiB)               : 604MB   2m09s
    (--check=crc32 --lzma2=dict=1MiB)                : 601MB   2m12s
    (--lzma2=dict=64KiB)                             : 604MB   2m12s
    (--lzma2=dict=1MiB)                              : 601MB   2m11s


------------------------------------------------------------------------------
results3 (libreoffice-core by amd64 machine)
------------------------------------------------------------------------------

  armel vs Intel Corei3 2.90MHz  -> almost x5 than armel. size is 10% large.

                                                       size    decomp-time
  xz  (--x86 --lzma2=dict=1MiB)                     :  25MB     2.7s

  

------------------------------------------------------------------------------
conclusion (half)
------------------------------------------------------------------------------
 We should use xz compression instead of bzip2 at least. bzip is harmful for
 compressing debian package, so should drop it from support to check easier.

 Using xz is
   - smaller than gz and bz2, able to be cut 1/3 size
   - faster  than bz2 and not much slower than gz (on armel arch, at least)
     1.5 times slower than gzip

 gzip or xz?
   - cut 1/3 size     = cut download time/traffic and repository size
   - slower 1.5 times = it takes more extract time when package is installed

   -> average download rate = almost 600KB/s
   -> download 35MB = 60 sec
               24MB = 40 sec -> diff = 20 sec

     + 4 sec - 20 sec = -16 sec (if you use xz)


------------------------------------------------------------------------------
conclusion (rest)
------------------------------------------------------------------------------
 I recommend to use xz ***by default*** (with appropriate option) on not only
 i386/amd64 but on ANY architectures. Increasing extract time can be ignore by
 decreasing download time and its only part of installation as Mike Hommey 
 suggested "I/O is still more time consuming than CPU", and nothing worse than
 high cpu usage.

 We know some packages are better to use gzip, but it's an exception. Using xz
 is best choice for rest 99.99% of packages. We can deal with such exception
 by specifying gzip for that (e.g. openclipart-png).


 *** what's the best compress option for default? ***

  low CPU                : --check=crc32               -> -10% time
  low memory             : --lzma2=dict=64KiB (or -0)  -> use 100KiB mem
  average CPU/memory     : --lzma2=dict=8MiB  (= -6 = default)
  use arch optimization? : Yes, if we can (*)          -> -10% size


 *** how to find appropriate compression rate(1, 6 or 9) for xz? ***

  build your package with each option :-)

  I've proposed tiny hack for debhelper, with specifying environment variable,
  it creates each compression option - gz, 1, 6, 9, 1e, 6e and 9e.
  See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=686048


------------------------------------------------------------
 *) tiny pseudo code

arch=`dpkg-architecture -qDEB_HOST_ARCH`

if [ arch = arm | armel | armhf | aarch64 ] // maybe
    set on_arch --arm
elsif [ arch = powerpc | ppc64 | powerpcspe ] // maybe
    set on_arch --powerpc
elsif [ arch = sparc | sparc64 ]    // maybe
    set on_arch --sparc
elsif [ arch = ia64 ]
    set on_arch --ia64
elsif [ arch = i386 | amd64 ]
    set --x86
fi


-- 
Regards,

 Hideki Yamane     henrich @ debian.or.jp/org
 http://wiki.debian.org/HidekiYamane


Reply to: