[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [debian-knoppix] A little "tweak" for better compression, saves > 2% space



Hello Christian,

On Sun, Feb 15, 2004 at 12:42:39AM +0100, Christian Leber wrote:
> Hello everyone,
> 
> a few days ago when working on a little modification of Knoppix
> i had an idea:
> 
> Why not using better algorithms for the gz compression?

Question: Have you looked at and understood the algorithm? I looked at
the source and am still a bit puzzled. What it does seems kind of trying
gzip-compression with different strategies several times, and then
taking the smalles output as final result. Your add-on does the same,
just with 7zip and the gzip "best compression", the latter one being the
current default. Just curious.

> Therefore I took cloop-2.01 and advancecomp-1.9 and hacked
> it together.
> (btw. the extract thing in cloop is brocken)

Apparently nobody (including me) noticed, because most are using the
cloop module for decompression. ;-) But I will have a look on
extract_compressed_fs again.

> (advancecomp uses parts from the great 7-zip to archive this)
> (I saw that with the new functions some blocks got bigger,
> so I just use both ways on every block and compare the size)
> (If it will be used I might clean it up etc. it's really just
> done to get it working, didn't know if it's better to send the
> advancecomp maintainer a little patch or Klaus a MB sized one)
> 
> It saves > 2%, I guess this also applies for a full CD
> sizes filesystem. (blocksize was 65536)

That's great, 2% is a lot of space for additional programs when you
consider the total size of the uncompressed image is about 2GB.
Plus, better compression reduces physical reads (even if it would only
mean an increase of performance of overall 2%). The thing  like best is
that the file format of the image, and the data within, stays compatible
with the old gzip format, so we really only need to change the
compressor.

> A little bit smaller filesystem I used for my tests because I
> had it handy:
> 
> core:/space# du -sh KNOPPIX
> 1.1G    KNOPPIX
> core:/space# ls -lh KNOPPIX_*
> -rw-r--r--    1 root     root         375M Feb 13 02:16 KNOPPIX_advfs
> -rw-r--r--    1 root     root         386M Feb 13 00:23 KNOPPIX_normal
> 
> I added a MD5 sum file for every file in the filesystem, mounted it
> and checked it, it just worked, I also took the time, but there is
> no performance decrease.

Due to the way that the gzip algorithm works, the decompression
performance is quite independent of the compression ration as opposed to
bzip2 where decompression is much slower).

> The only negative thing is the time it takes for compressing.
> On my box (2000 Mhz) it was going up from 9 min to about 70 min.

The code may need some streamlining, and the compression options are
probably not optimal for 64k blocks yet, so the compression time MAY
be affected if we tweak the available compression parameters some more.

Also, in my opinion, it makes sense to make create_compressed_fs
cluster-capable, since compressing independent blocks seems to be an
excellent parallelizable task. So, you would just run the compression on
a cluster of machines running Mosix or a simliar distributed computing
environment.

> To use it, just run ./configure; make  then you can use advfs instead of
> create_compressed_fs.

Tried it, works for me.

> To Klaus:
> Perhaps it's early enough for the real Knoppix 3.4,

Should be.  We still have a week or so before I have to produce the
final CeBit 3.4 release, so there is still time to experiment with the
compression tool a little.

Regards
-Klaus Knopper
_______________________________________________
debian-knoppix mailing list
debian-knoppix@linuxtag.org
http://mailman.linuxtag.org/mailman/listinfo/debian-knoppix


Reply to: