[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Solving the compression dilema when rsync-ing Debian versions

>>>>> " " == Richard Atterer <deb-devel@list.atterer.net> writes:

     > On Fri, Jan 12, 2001 at 09:20:38PM +0100, Jean-loup Gailly
     > wrote:
    >> I am "upstream" and I do want to make gzip rsync-friendly by
    >> default (without even a --rsync option) since the cost in
    >> compression ratio is negligible.

     > This patch keeps getting more and more interesting! Up to now,
     > based on the short description by Martijn van Oosterhout, I was
     > under the impression that "rsyncability" was only possible if
     > the compressed old version of the data was still
     > available. However, the above remark sounds to me as if it can
     > be achieved even without that.

     > Just how does it work, pray tell?  Is the patch and/or a more
     > detailed description available somewhere?

>From time to time gzip will flush the dictionary and start with a
clean slate.

The trick now is to make this happen at special points in the file
that don't change when the file is altered. To do this the rolling
checksum algorithm (alder-32) is done for a 4K block and, when the
result is equal to a magic (0), a flush is forced.

This forced flush happens at random places and not too often
(increases linux.tar.gz by ~3%). The flush does not depend on the
position but on the data compressed. So when two files match for a few
K, they will both hit a flush at the same position. When the file is
altered at the front gzip will still flush the dictionary at the same
places at the end, so the files will match at the end.

Does that explain how it works?


Reply to: