Re: Debian's problems, Debian's future
On Tue, 2002-04-09 at 17:25, Martijn van Oosterhout wrote:
> What you are suggesting is that the server store checksums for precalculated
> blocks on the server. This would be 4 bytes per 1k in the original file or
> so. The transaction proceeds as follows:
> 1. Client asks for checksum list off server
> 2. Client calculates checksums for local file
> 3. Client compares list of server with list of client
> 4. Client downloads changed regions.
> Note, this is not the rsync algorithm, but the one that is possibly
This looks like an interesting algorithm, so I decided to compare it to
the diff scheme analyzed in
The above message also gives my analysis methodology.
- The following table summarizes the performance of the checksum-based
scheme and the diff-based scheme under the assumption that users tend to
perform apt-get update often. I think disk space is cheap and bandwidth
is expensive, so 20 days of diffs is the best choice.
Scheme Disk space Bandwidth
Checksums (bwidth optimal) 26K 81K
diffs (4 days) 32K 331K
diffs (9 days) 71K 66K
diffs (20 days) 159K 27K
- The analysis is unfairly favorable to the checksum scheme, because I
do not count the bandwidth required to request all the changed blocks,
only the bandwidth used to transmit the changed blocks.
- For the user model in the message above, the optimal block size for
this algorithm is around 245 bytes .
- In the diff-based scheme, each mirror can decide on a
diskspace/bandwidth tradeoff by simply keeping more old diffs or
deleting some old diffs. The checksum-based scheme doesn't really
support tweaking at the mirror.
- I tend to update every day. For people who update every day, the
diff-based scheme only needs to transfer about 8K, but the
checksum-based scheme needs to transfer 45K. So for me, diffs are
To UNSUBSCRIBE, email to email@example.com
with a subject of "unsubscribe". Trouble? Contact firstname.lastname@example.org