[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

some statistics on using bz2 for packages



I took the (immense amount of) time do some real testing on using bz2 for
packages. My test case is a local mirror of the potato binary-sparc tree.
Note that this is a flat tree, no symlinks to slink, nor binary-all. It's
all in one tree (however, I do not mirror */kernel-source-* or
*/kernel-image-*, so the measurements here exclude some larger size
packages, meaning it would have been better if they were there).

Here you will see the output of of `du -sk' for the entire tree and it's
subdirectories (sections):

1162439 binary-sparc

2406    binary-sparc/Packages
682     binary-sparc/Packages.gz
1       binary-sparc/Release
13374   binary-sparc/admin
10892   binary-sparc/base
5635    binary-sparc/comm
215017  binary-sparc/devel
170773  binary-sparc/doc
89502   binary-sparc/editors
2990    binary-sparc/electronics
45035   binary-sparc/games
40357   binary-sparc/graphics
1257    binary-sparc/hamradio
42870   binary-sparc/interpreters
45695   binary-sparc/libs
24604   binary-sparc/mail
59691   binary-sparc/math
26643   binary-sparc/misc
35231   binary-sparc/net
10560   binary-sparc/news
8285    binary-sparc/oldlibs
3781    binary-sparc/otherosfs
3192    binary-sparc/shells
20943   binary-sparc/sound
53484   binary-sparc/tex
80614   binary-sparc/text
21764   binary-sparc/utils
34259   binary-sparc/web
92901   binary-sparc/x11

Now I write a simple script that converted all of the packages to bz2
(using -9 compression, which takes 3700k of memory to decompress). Since I
wanted to be fair, I did not compress base, since most likely it will not
be compressed with bz2 on the archive. Also I did not compress Packages*
files. Here is the out come of that:

1056029 binary-sparc
2406    binary-sparc/Packages
682     binary-sparc/Packages.gz
1       binary-sparc/Release
10921   binary-sparc/admin
10892   binary-sparc/base
4952    binary-sparc/comm
176709  binary-sparc/devel
162243  binary-sparc/doc
82456   binary-sparc/editors
2854    binary-sparc/electronics
41959   binary-sparc/games
38262   binary-sparc/graphics
1272    binary-sparc/hamradio
38604   binary-sparc/interpreters
42533   binary-sparc/libs
20861   binary-sparc/mail
53901   binary-sparc/math
24377   binary-sparc/misc
32262   binary-sparc/net
9710    binary-sparc/news
7561    binary-sparc/oldlibs
3505    binary-sparc/otherosfs
3085    binary-sparc/shells
19144   binary-sparc/sound
49224   binary-sparc/tex
75305   binary-sparc/text
20484   binary-sparc/utils
31332   binary-sparc/web
88531   binary-sparc/x11

As you can see, some 106megs were saved, roughly 10%. I then went back and
recompressed these, still using bzip2, but with -4 (which only uses 1700k
to decompress, compared to gzip's 1300k for -9) and the savings equated to
about 9% (some 96 megs).

Summary: if we were to multiply this over several archs and 2
distributions, _and_ source, we would see a huge savings. Not to mention,
this would bring the binary-sparc distribution just down to 2 CD size
(remember all of the tools are included and the boot disks). IMO, we only
need to use -4 as a standard for compressing packages, source can use -9
with no problems, since they aren't unpacked in bulk like packages are.
This also get's rid of the "small systems can't use it" problem (I still
need to do some time tests).

Ben


Reply to: