some statistics on using bz2 for packages
I took the (immense amount of) time do some real testing on using bz2 for
packages. My test case is a local mirror of the potato binary-sparc tree.
Note that this is a flat tree, no symlinks to slink, nor binary-all. It's
all in one tree (however, I do not mirror */kernel-source-* or
*/kernel-image-*, so the measurements here exclude some larger size
packages, meaning it would have been better if they were there).
Here you will see the output of of `du -sk' for the entire tree and it's
subdirectories (sections):
1162439 binary-sparc
2406 binary-sparc/Packages
682 binary-sparc/Packages.gz
1 binary-sparc/Release
13374 binary-sparc/admin
10892 binary-sparc/base
5635 binary-sparc/comm
215017 binary-sparc/devel
170773 binary-sparc/doc
89502 binary-sparc/editors
2990 binary-sparc/electronics
45035 binary-sparc/games
40357 binary-sparc/graphics
1257 binary-sparc/hamradio
42870 binary-sparc/interpreters
45695 binary-sparc/libs
24604 binary-sparc/mail
59691 binary-sparc/math
26643 binary-sparc/misc
35231 binary-sparc/net
10560 binary-sparc/news
8285 binary-sparc/oldlibs
3781 binary-sparc/otherosfs
3192 binary-sparc/shells
20943 binary-sparc/sound
53484 binary-sparc/tex
80614 binary-sparc/text
21764 binary-sparc/utils
34259 binary-sparc/web
92901 binary-sparc/x11
Now I write a simple script that converted all of the packages to bz2
(using -9 compression, which takes 3700k of memory to decompress). Since I
wanted to be fair, I did not compress base, since most likely it will not
be compressed with bz2 on the archive. Also I did not compress Packages*
files. Here is the out come of that:
1056029 binary-sparc
2406 binary-sparc/Packages
682 binary-sparc/Packages.gz
1 binary-sparc/Release
10921 binary-sparc/admin
10892 binary-sparc/base
4952 binary-sparc/comm
176709 binary-sparc/devel
162243 binary-sparc/doc
82456 binary-sparc/editors
2854 binary-sparc/electronics
41959 binary-sparc/games
38262 binary-sparc/graphics
1272 binary-sparc/hamradio
38604 binary-sparc/interpreters
42533 binary-sparc/libs
20861 binary-sparc/mail
53901 binary-sparc/math
24377 binary-sparc/misc
32262 binary-sparc/net
9710 binary-sparc/news
7561 binary-sparc/oldlibs
3505 binary-sparc/otherosfs
3085 binary-sparc/shells
19144 binary-sparc/sound
49224 binary-sparc/tex
75305 binary-sparc/text
20484 binary-sparc/utils
31332 binary-sparc/web
88531 binary-sparc/x11
As you can see, some 106megs were saved, roughly 10%. I then went back and
recompressed these, still using bzip2, but with -4 (which only uses 1700k
to decompress, compared to gzip's 1300k for -9) and the savings equated to
about 9% (some 96 megs).
Summary: if we were to multiply this over several archs and 2
distributions, _and_ source, we would see a huge savings. Not to mention,
this would bring the binary-sparc distribution just down to 2 CD size
(remember all of the tools are included and the boot disks). IMO, we only
need to use -4 as a standard for compressing packages, source can use -9
with no problems, since they aren't unpacked in bulk like packages are.
This also get's rid of the "small systems can't use it" problem (I still
need to do some time tests).
Ben
Reply to: