Re: Rsync on servers
rsync has two cases where it realy hogs the disk and cpu of the
server and the disk hogging is by far the bigger problem:
1. rsync -r
For example on the complete debian mirror it takes, depending on the
load and bandwith, one or more minutes. This just seeks the drive to
death with multiple clients.
2. rsync of near identical files
This loads in the complete file at the full drive speed and calculates
the checksums. I've gotten 30-40 MB/s throughput without too much cpu
usage if the drive can handle that and does not seek. Given a big
enough blocksize the resulting network traffic can be just a few K
even for realy big files (CD images and up).
But, as mentioned before by me, for normal mirrors both cases can be
1. after each mirror update one can generate ls-lR files for each
directory (thats changed)
2. The rsync algorithm can be used with client and server
reversed. That has the effect that the mirror server only needs to
send one checksum per block (instead of calculating and comparing them
per byte) and thus can precalculate those per file and just send them.
Another goody of this method is that you can use any HTTP/1.1 server
to fetch files from, provided the precalculated checksums are known or
present or generated by server module or cgi.
I wrote a client and checksum file generator that uses the rsync
algorithm in reverse and the HTPP/1.1 protocol to fetch files.
The bandwith useage of this method is just marginally greater than
normal http in the case where the checksum file is fetched but no
matches are found, which means ~1% increase in bandwith useage.
The cpu useage to serve a file via http should not matter on modern
cpus, even if the file is fetched using multiple RANGE commands.
The overall efficiency is identical to that of rsync.
May the Source be with you.
PS: noone ever reacted to my proposal to add precalculated checksum
files to the debian mirror to alow rsyncing from any http mirror.