[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Debian's problems, Debian's future



On Tue, 2002-04-09 at 17:25, Martijn van Oosterhout wrote:
> What you are suggesting is that the server store checksums for precalculated
> blocks on the server. This would be 4 bytes per 1k in the original file or
> so. The transaction proceeds as follows:
> 
> 1. Client asks for checksum list off server
> 2. Client calculates checksums for local file
> 3. Client compares list of server with list of client
> 4. Client downloads changed regions.
> 
> Note, this is not the rsync algorithm, but the one that is possibly
> patented.

This looks like an interesting algorithm, so I decided to compare it to
the diff scheme analyzed in 
http://lists.debian.org/debian-devel/2002/debian-devel-200204/msg00502.html

The above message also gives my analysis methodology.

The results:
------------

- The following table summarizes the performance of the checksum-based
scheme and the diff-based scheme under the assumption that users tend to
perform apt-get update often.  I think disk space is cheap and bandwidth
is expensive, so 20 days of diffs is the best choice.

Scheme                         Disk space         Bandwidth
-----------------------------------------------------------
Checksums (bwidth optimal)            26K               81K
diffs (4 days)                        32K              331K
diffs (9 days)                        71K               66K
diffs (20 days)                      159K               27K

- The analysis is unfairly favorable to the checksum scheme, because I
do not count the bandwidth required to request all the changed blocks,
only the bandwidth used to transmit the changed blocks.

- For the user model in the message above, the optimal block size for
this algorithm is around 245 bytes .

- In the diff-based scheme, each mirror can decide on a
diskspace/bandwidth tradeoff by simply keeping more old diffs or
deleting some old diffs.  The checksum-based scheme doesn't really
support tweaking at the mirror.

- I tend to update every day.  For people who update every day, the
diff-based scheme only needs to transfer about 8K, but the
checksum-based scheme needs to transfer 45K.  So for me, diffs are
better. :)

Best,
Rob



-- 
To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org



Reply to: