Re: Debian's problems, Debian's future

To: debian-devel@lists.debian.org
Subject: Re: Debian's problems, Debian's future
From: Robert Tiberius Johnson <rtjohnso@cs.berkeley.edu>
Date: 10 Apr 2002 01:26:17 -0700
Message-id: <[🔎] 1018427182.2636.67.camel@wonne>
In-reply-to: <[🔎] 20020410102522.B11138@svana.org>
References: <20020328100328.GC18375@guests.deus.net> <1f9rjv5.ycgkqh1ff0vnkM%otto.wyss@bluewin.ch> <20020328191746.GC1514@celeron.dekkers> <[🔎] 20020409090939.F1240@home.debsupport.de> <[🔎] 20020409143443.GC1638@celeron.dekkers> <[🔎] 20020409170234.B28074@home.debsupport.de> <[🔎] 20020410102522.B11138@svana.org>

On Tue, 2002-04-09 at 17:25, Martijn van Oosterhout wrote:
> What you are suggesting is that the server store checksums for precalculated
> blocks on the server. This would be 4 bytes per 1k in the original file or
> so. The transaction proceeds as follows:
> 
> 1. Client asks for checksum list off server
> 2. Client calculates checksums for local file
> 3. Client compares list of server with list of client
> 4. Client downloads changed regions.
> 
> Note, this is not the rsync algorithm, but the one that is possibly
> patented.

This looks like an interesting algorithm, so I decided to compare it to
the diff scheme analyzed in 
http://lists.debian.org/debian-devel/2002/debian-devel-200204/msg00502.html

The above message also gives my analysis methodology.

The results:
------------

- The following table summarizes the performance of the checksum-based
scheme and the diff-based scheme under the assumption that users tend to
perform apt-get update often.  I think disk space is cheap and bandwidth
is expensive, so 20 days of diffs is the best choice.

Scheme                         Disk space         Bandwidth
-----------------------------------------------------------
Checksums (bwidth optimal)            26K               81K
diffs (4 days)                        32K              331K
diffs (9 days)                        71K               66K
diffs (20 days)                      159K               27K

- The analysis is unfairly favorable to the checksum scheme, because I
do not count the bandwidth required to request all the changed blocks,
only the bandwidth used to transmit the changed blocks.

- For the user model in the message above, the optimal block size for
this algorithm is around 245 bytes .

- In the diff-based scheme, each mirror can decide on a
diskspace/bandwidth tradeoff by simply keeping more old diffs or
deleting some old diffs.  The checksum-based scheme doesn't really
support tweaking at the mirror.

- I tend to update every day.  For people who update every day, the
diff-based scheme only needs to transfer about 8K, but the
checksum-based scheme needs to transfer 45K.  So for me, diffs are
better. :)

Best,
Rob



-- 
To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Reply to:

Follow-Ups:
- Re: Debian's problems, Debian's future
  - From: Anthony Towns <aj@azure.humbug.org.au>
- Re: Debian's problems, Debian's future
  - From: Michael Bramer <grisu@debian.org>
- Re: Debian's problems, Debian's future
  - From: Erich Schubert <erich@debian.org>

References:
- Re: Debian's problems, Debian's future
  - From: Michael Bramer <grisu@debian.org>
- Re: Debian's problems, Debian's future
  - From: Jeroen Dekkers <jeroen@dekkers.cx>
- Re: Debian's problems, Debian's future
  - From: Michael Bramer <grisu@debian.org>
- Re: Debian's problems, Debian's future
  - From: Martijn van Oosterhout <kleptog@svana.org>

Prev by Date: Re: Debian's problems, Debian's future
Next by Date: upload rejected: md5sum for .orig.tar.gz doesn't match .dsc
Previous by thread: Re: Debian's problems, Debian's future
Next by thread: Re: Debian's problems, Debian's future
Index(es):
- Date
- Thread