Re: Rsync on servers (was Re: RFC: Checking for updates)

To: debian-devel@lists.debian.org
Subject: Re: Rsync on servers (was Re: RFC: Checking for updates)
From: Martijn van Oosterhout <kleptog@svana.org>
Date: Sun, 4 Nov 2001 14:08:55 +1100
Message-id: <[🔎] 20011104140855.A14632@svana.org>
Mail-followup-to: debian-devel@lists.debian.org
Reply-to: Martijn van Oosterhout <kleptog@svana.org>
In-reply-to: <[🔎] 20011103161634.D31016@alcor.net>; from mdz@debian.org on Sat, Nov 03, 2001 at 04:16:34PM -0500
References: <1f25wja.13puct03g7r0cM%otto.wyss@bluewin.ch> <Pine.LNX.4.33.0110311501230.1102-100000@doogie2.private.brainfood.com> <[🔎] 20011103061543.B31016@alcor.net> <[🔎] 20011103231020.C6811@svana.org> <[🔎] 20011103161634.D31016@alcor.net>

On Sat, Nov 03, 2001 at 04:16:34PM -0500, Matt Zimmerman wrote:
> On Sat, Nov 03, 2001 at 11:10:20PM +1100, Martijn van Oosterhout wrote:
> > Last time I heard this idea, it was pointed out that the checksum data is 4
> > or 8 times larger than the file it is checksumming.
> > 
> > I don't think archive maintainers would like that...
> 
> If that were true, rsync wouldn't save bandwidth by transferring and
> verifying the checksums, no?  Even so, we're only talking about the
> Packages files, which are relatively small compared to the archive as a
> whole.

The algorithm works like this: The client divides the file it has up into
blocks, calculates the checksum for each and sends those to the server. The
server scans through the file on the server, doing a rolling checksum at
every position in the file with the same blocksize and sends back a list of
tokens representing either data or blocks on the client.

So by precalculating the checksums on the server, you are asking it to
remember the 4 (or 8) byte checksum value for each possible block in the
file.

You can probably see that this algorithm can be reversed. Client asks for
block checksum list from server. Client matches those checksums to what it
has and requests a list of data blocks from the server for blocks it
couldn't match. Fairly light on the server end and precalculation is a win
here because the checksums would be less than 1% of the original file. Not
sure why it hasn't been done yet.

The reason it was done the other way first is because it only required a
single request/response model which could be streamed for extra performance.

HTH,
-- 
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> Magnetism, electricity and motion are like a three-for-two special offer:
> if you have two of them, the third one comes free.

Reply to:

References:
- Re: Rsync on servers (was Re: RFC: Checking for updates)
  - From: Matt Zimmerman <mdz@debian.org>
- Re: Rsync on servers (was Re: RFC: Checking for updates)
  - From: Martijn van Oosterhout <kleptog@svana.org>
- Re: Rsync on servers (was Re: RFC: Checking for updates)
  - From: Matt Zimmerman <mdz@debian.org>

Prev by Date: Re: bad upgrade from talkd to ktalkd ... whose bug?
Next by Date: ITP: silc -- Secure Internet Live Conferencing
Previous by thread: Re: Rsync on servers (was Re: RFC: Checking for updates)
Next by thread: which package owns this bug?
Index(es):
- Date
- Thread