[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Debian's problems, Debian's future

On Wed, Apr 10, 2002 at 08:29:49PM +1000, Martijn van Oosterhout wrote:
> On Wed, Apr 10, 2002 at 09:22:50AM +0200, Michael Bramer wrote:
> > On Wed, Apr 10, 2002 at 10:25:22AM +1000, Martijn van Oosterhout wrote:
> > > With the standard rsync algorithm, the rsync checksum files would actually
> > > be 8 times larger than the original file (you need to store the checksum
> > > for each possible block in the file).
> > 
> > I don't see that the checksum file is larger than the origanl file. If
> > the checksum file is larger, we will have more bytes to download... This
> > was not the goal.
> That's because the client doesn't not download the checksums. Look below.
> > maybe I don't understand the rsync algorithm...
> > 
> > IMHO the rsync algorithm is:
> >  1.) Computer beta splits file B in blocks.
> >  2.) calculate two checksums 
> >      a.) weak ``rolling'' 32-bit checksum
> >      b.) md5sum
> >  3.) Computer B send this to computer A.
> >  4.) Computer A search in file A for parts with the same checksums from
> >      file B
> >  5.) Computer A request unmatch blocks from computer B and 
> >      build the file B.
> > 
> > I get this from /usr/share/doc/rsync/tech_report.tex.gz
> Computer A wants to download a file F from computer B.
> 1. Computer A splits it's version into blocks, calculates the checksum for
> each block.
> 2. Computer A sends this list to computer B. This should be <1% the size of
> the original file. Depends on the block size.
> 3. Computer B takes this list and does the rolling checksum over the file.
> Basically, it calculates the checksum for bytes 0-1023, checks for it in the
> list from the client. If it's a match send back a string indicating which
> block it is, else send byte 0. Calculate checksum of 1-1024 and do the same.
> The rolling checksum is just an optimisation.
> 4. Computer A receives list of "tokens" which are either bytes of data or
> indications of which block to copy from the original file.

all ok. I write the same above, except point '4' and you switch A and

> Notice that:
> a. The server (computer B) does *all* the work.

If you use A as Server, the client make all the work.

> c. Precalculating checksums on the client is useless
> d. Precalculating checksums on the server is also useless because the
> storage would be more (remember, checksum for bytes 0-1023, then for 1-1024,
> 2-1025, etc). It's faster to calculate them than to load them off disk.

Precalculating of the _block_ checksums is _not_ useless. This checksums
are only <1% the size of the original file (depends on the block size). 

> So, the main difference between what you are proposing is 1 versus 2
> requests per file. And rsync definitly only has one.

The main difference is: The client and not the server make all the work!

> Besides, look at the other posts on this thread. Diff requires less download
> than rsync.

I read it, but I don't understand it.

But this is not the problem. IMHO the diff is a kind of a hack and a
cached rsync is a nice framework. But this is only my taste...

Maybe I should read the rsync-source-code...Done

  Ok, with the normal rsync program the client make the block checksums
  and the server search in the file...

Thanks for your help.

Michael Bramer  -  a Debian Linux Developer      http://www.debsupport.de
PGP: finger grisu@db.debian.org  -- Linux Sysadmin   -- Use Debian Linux
"Hummeln koennen wirklich stechen, tun das aber nur in extremen Ausnahme-
Situationen. NT tut in solchen Situationen nichts mehr." aus d.a.s.r

Attachment: pgpK7ly43eAL5.pgp
Description: PGP signature

Reply to: