Re: Solving the compression dilema when rsync-ing Debian versions

To: debian-devel@lists.debian.org
Subject: Re: Solving the compression dilema when rsync-ing Debian versions
From: Richard Atterer <deb-devel@list.atterer.net>
Date: Sun, 14 Jan 2001 13:53:45 +0100
Message-id: <[🔎] 20010114135345.A21954@atterer.net>
Mail-followup-to: debian-devel@lists.debian.org
In-reply-to: <[🔎] 87r9277y89.fsf@mose.informatik.uni-tuebingen.de>; from goswin.brederlow@student.uni-tuebingen.de on Sun, Jan 14, 2001 at 01:51:02AM +0100
References: <3A5CE5AB.56A42FEC@bluewin.ch> <[🔎] 14943.26390.729463.348169@kerla.mandrakesoft.com> <[🔎] 20010113203551.A19901@atterer.net> <[🔎] 87r9277y89.fsf@mose.informatik.uni-tuebingen.de>

On Sun, Jan 14, 2001 at 01:51:02AM +0100, Goswin Brederlow wrote:
> >>>>> " " == Richard Atterer <deb-devel@list.atterer.net> writes:
>      > Just how does it work, pray tell?  Is the patch and/or a more
>      > detailed description available somewhere?
> 
> From time to time gzip will flush the dictionary and start with a
> clean slate.
> 
> The trick now is to make this happen at special points in the file
> that don't change when the file is altered. To do this the rolling
> checksum algorithm (alder-32) is done for a 4K block and, when the
> result is equal to a magic (0), a flush is forced.

Ah, the magic rolling checksum value is the "missing link"!

But I'm surprised that the value 0, one out of 2^32 possible Adler32
checksum values, appears so often in typical data to make the scheme
work?! Seems like Adler32 isn't so strong a checksum after all. :-/

BTW, 0 is the Adler32 of an all-zeroes area - if the uncompressed data
contains long runs of zero, there will be *lots* of flushes unless
special action is taken.

> This forced flush happens at random places and not too often
> (increases linux.tar.gz by ~3%).

Am I guessing correctly that the value 0 was only chosen "randomly",
not for any particular reason, and that a zero rolling checksum only
occurs every MB or so?

By altering the size of the area from the default 4k, you can even
have a smooth trade-off between compression ratio and rsync transfer
volume - nice!

Thanks for the explanation!
Cheers,

  Richard

-- 
  __   _
  |_) /|  Richard Atterer                      | CS student at the Technische
  | \/¯|  http://atterer.net                   | Universität München, Germany
  ¯ ´` ¯

Reply to:

Follow-Ups:
- Re: Solving the compression dilema when rsync-ing Debian versions
  - From: Martin Pool <mbp@linuxcare.com.au>

References:
- Re: Solving the compression dilema when rsync-ing Debian versions
  - From: Jean-loup Gailly <jloup@gzip.org>
- Re: Solving the compression dilema when rsync-ing Debian versions
  - From: Richard Atterer <deb-devel@list.atterer.net>
- Re: Solving the compression dilema when rsync-ing Debian versions
  - From: Goswin Brederlow <goswin.brederlow@student.uni-tuebingen.de>

Prev by Date: Re: the (unknown) maintainer in the BTS
Next by Date: Re: [EOT, hopefully] NM flamefest
Previous by thread: Re: Solving the compression dilema when rsync-ing Debian versions
Next by thread: Re: Solving the compression dilema when rsync-ing Debian versions
Index(es):
- Date
- Thread