Re: Solving the compression dilema when rsync-ing Debian versions

To: Richard Atterer <deb-devel@list.atterer.net>
Cc: debian-devel@lists.debian.org
Subject: Re: Solving the compression dilema when rsync-ing Debian versions
From: Goswin Brederlow <goswin.brederlow@student.uni-tuebingen.de>
Date: 14 Jan 2001 01:51:02 +0100
Message-id: <[🔎] 87r9277y89.fsf@mose.informatik.uni-tuebingen.de>
In-reply-to: Richard Atterer's message of "Sat, 13 Jan 2001 20:35:51 +0100"
References: <3A5CE5AB.56A42FEC@bluewin.ch> <[🔎] 14943.26390.729463.348169@kerla.mandrakesoft.com> <[🔎] 20010113203551.A19901@atterer.net>

>>>>> " " == Richard Atterer <deb-devel@list.atterer.net> writes:

     > On Fri, Jan 12, 2001 at 09:20:38PM +0100, Jean-loup Gailly
     > wrote:
    >> I am "upstream" and I do want to make gzip rsync-friendly by
    >> default (without even a --rsync option) since the cost in
    >> compression ratio is negligible.

     > This patch keeps getting more and more interesting! Up to now,
     > based on the short description by Martijn van Oosterhout, I was
     > under the impression that "rsyncability" was only possible if
     > the compressed old version of the data was still
     > available. However, the above remark sounds to me as if it can
     > be achieved even without that.

     > Just how does it work, pray tell?  Is the patch and/or a more
     > detailed description available somewhere?

>From time to time gzip will flush the dictionary and start with a
clean slate.

The trick now is to make this happen at special points in the file
that don't change when the file is altered. To do this the rolling
checksum algorithm (alder-32) is done for a 4K block and, when the
result is equal to a magic (0), a flush is forced.

This forced flush happens at random places and not too often
(increases linux.tar.gz by ~3%). The flush does not depend on the
position but on the data compressed. So when two files match for a few
K, they will both hit a flush at the same position. When the file is
altered at the front gzip will still flush the dictionary at the same
places at the end, so the files will match at the end.

Does that explain how it works?

MfG
        Goswin

Reply to:

Follow-Ups:
- Re: Solving the compression dilema when rsync-ing Debian versions
  - From: Richard Atterer <deb-devel@list.atterer.net>
- Re: Solving the compression dilema when rsync-ing Debian versions
  - From: Martin Pool <mbp@linuxcare.com.au>
- Re: Solving the compression dilema when rsync-ing Debian versions
  - From: Daniel Burrows <Daniel_Burrows@brown.edu>

References:
- Re: Solving the compression dilema when rsync-ing Debian versions
  - From: Jean-loup Gailly <jloup@gzip.org>
- Re: Solving the compression dilema when rsync-ing Debian versions
  - From: Richard Atterer <deb-devel@list.atterer.net>

Prev by Date: diskless package not in update_excuses?
Next by Date: problems managing package with CVS
Previous by thread: Re: Solving the compression dilema when rsync-ing Debian versions
Next by thread: Re: Solving the compression dilema when rsync-ing Debian versions
Index(es):
- Date
- Thread