Re: Debian's problems, Debian's future

To: debian-devel@lists.debian.org
Subject: Re: Debian's problems, Debian's future
From: Michael Bramer <grisu@debian.org>
Date: Wed, 10 Apr 2002 15:21:17 +0200
Message-id: <[🔎] 20020410152117.A23259@home.debsupport.de>
Mail-followup-to: Michael Bramer <grisu@debian.org>, debian-devel@lists.debian.org
In-reply-to: <[🔎] 20020410202949.B12535@svana.org>; from kleptog@svana.org on Wed, Apr 10, 2002 at 08:29:49PM +1000
References: <20020328100328.GC18375@guests.deus.net> <1f9rjv5.ycgkqh1ff0vnkM%otto.wyss@bluewin.ch> <20020328191746.GC1514@celeron.dekkers> <[🔎] 20020409090939.F1240@home.debsupport.de> <[🔎] 20020409143443.GC1638@celeron.dekkers> <[🔎] 20020409170234.B28074@home.debsupport.de> <[🔎] 20020410102522.B11138@svana.org> <[🔎] 20020410092249.N1240@home.debsupport.de> <[🔎] 20020410202949.B12535@svana.org>

On Wed, Apr 10, 2002 at 08:29:49PM +1000, Martijn van Oosterhout wrote:
> On Wed, Apr 10, 2002 at 09:22:50AM +0200, Michael Bramer wrote:
> > On Wed, Apr 10, 2002 at 10:25:22AM +1000, Martijn van Oosterhout wrote:
> > > With the standard rsync algorithm, the rsync checksum files would actually
> > > be 8 times larger than the original file (you need to store the checksum
> > > for each possible block in the file).
> > 
> > I don't see that the checksum file is larger than the origanl file. If
> > the checksum file is larger, we will have more bytes to download... This
> > was not the goal.
> 
> That's because the client doesn't not download the checksums. Look below.
> 
> > maybe I don't understand the rsync algorithm...
> > 
> > IMHO the rsync algorithm is:
> >  1.) Computer beta splits file B in blocks.
> >  2.) calculate two checksums 
> >      a.) weak ``rolling'' 32-bit checksum
> >      b.) md5sum
> >  3.) Computer B send this to computer A.
> >  4.) Computer A search in file A for parts with the same checksums from
> >      file B
> >  5.) Computer A request unmatch blocks from computer B and 
> >      build the file B.
> > 
> > I get this from /usr/share/doc/rsync/tech_report.tex.gz
> 
> Computer A wants to download a file F from computer B.
> 
> 1. Computer A splits it's version into blocks, calculates the checksum for
> each block.
> 2. Computer A sends this list to computer B. This should be <1% the size of
> the original file. Depends on the block size.
> 3. Computer B takes this list and does the rolling checksum over the file.
> Basically, it calculates the checksum for bytes 0-1023, checks for it in the
> list from the client. If it's a match send back a string indicating which
> block it is, else send byte 0. Calculate checksum of 1-1024 and do the same.
> The rolling checksum is just an optimisation.
> 4. Computer A receives list of "tokens" which are either bytes of data or
> indications of which block to copy from the original file.

all ok. I write the same above, except point '4' and you switch A and
B...

> Notice that:
> a. The server (computer B) does *all* the work.

If you use A as Server, the client make all the work.

> c. Precalculating checksums on the client is useless
> d. Precalculating checksums on the server is also useless because the
> storage would be more (remember, checksum for bytes 0-1023, then for 1-1024,
> 2-1025, etc). It's faster to calculate them than to load them off disk.

Precalculating of the _block_ checksums is _not_ useless. This checksums
are only <1% the size of the original file (depends on the block size). 

> So, the main difference between what you are proposing is 1 versus 2
> requests per file. And rsync definitly only has one.

The main difference is: The client and not the server make all the work!

> Besides, look at the other posts on this thread. Diff requires less download
> than rsync.

I read it, but I don't understand it.

But this is not the problem. IMHO the diff is a kind of a hack and a
cached rsync is a nice framework. But this is only my taste...

Maybe I should read the rsync-source-code...Done

  Ok, with the normal rsync program the client make the block checksums
  and the server search in the file...

Thanks for your help.

Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer      http://www.debsupport.de
PGP: finger grisu@db.debian.org  -- Linux Sysadmin   -- Use Debian Linux
"Hummeln koennen wirklich stechen, tun das aber nur in extremen Ausnahme-
Situationen. NT tut in solchen Situationen nichts mehr." aus d.a.s.r

Attachment: pgplt4DilAXnO.pgp
Description: PGP signature

Reply to:

References:
- Re: Debian's problems, Debian's future
  - From: Michael Bramer <grisu@debian.org>
- Re: Debian's problems, Debian's future
  - From: Jeroen Dekkers <jeroen@dekkers.cx>
- Re: Debian's problems, Debian's future
  - From: Michael Bramer <grisu@debian.org>
- Re: Debian's problems, Debian's future
  - From: Martijn van Oosterhout <kleptog@svana.org>
- Re: Debian's problems, Debian's future
  - From: Michael Bramer <grisu@debian.org>
- Re: Debian's problems, Debian's future
  - From: Martijn van Oosterhout <kleptog@svana.org>

Prev by Date: Re: yes && bg
Next by Date: please upgrade libxslt package to 1.0.15
Previous by thread: Re: Debian's problems, Debian's future
Next by thread: Re: Debian's problems, Debian's future
Index(es):
- Date
- Thread