Re: effectiveness of rsync and apt
Tyler MacDonald <firstname.lastname@example.org> writes:
> Goswin von Brederlow <email@example.com> wrote:
>> Bittorrent has a per chunk hash so it can validate each chunk when it
>> recieves it instead of waiting for the full file. It won't see if a
>> chunk is present at some other position in the file, not even if that
>> position is also on chunk boundaries.
>> Rsync has a per chunk Alder-32 and md4 checksum. Those chunk checksums
>> are compared to a chunk at every byte position in the file. The
>> Adler-32 checksum is fairly weak but it can be updated from one
>> position to the next with minimal work. Only when it matches does
>> rsync compute the expensive md4 checksum for the block.
>> The only thing that is simmilar is the "per block" when generating the
>> checksum, which is basicaly nothing.
> Actually it's quite a bit of similarity... but you're right, they
> still are very different. From the article, it sounds like the author is
> suggesting storing these checksum values for quick retrieval, which gets
> closer to what BitTorrent is doing. If an rsync daemon were to spit out IP's
> of clients that were mirroring the exact same thing (which is technically
> feasable, given that an rsync client could easily send it's relevant
> command-line arguments upstream), then rsync clients could talk to
> eachother, which would lower the bandwidth requirements of top-level debian
> mirrors significantly.
The biggest difference is that the checksums aren't retrieved. They
are send. The client sends the checksums of the existing local file
and the server sends back blocks of raw data and matches to existing
chunks. Quite the reverse of BT.
For precalculated (and retrivable) checksums look at zsync.
Mit freundlichen Gruessen