[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Apt & rsync



On Sat, 16 Oct 1999, Dylan Thurston wrote:

> Ah, that's a pity.  Any chance of improving this?  I suppose it's
> inherently going to work harder than an ftp daemon, but, for instance, it
> shouldn't be hard to modify rsync to cache the checksums.  (The strong 

Part of the problem is the wide dispersion in load that different rsync
users can cause. Someone using rsync for just a single file would
not cause much server loading, but someone transfering a whole archive, or
doing file lists or doing md5's or whatever generates huge amounts of
load.

Since they are grouped all together it is really hard to fairly balance
things. Keep in mind that debian.org FTP sites do a huge amount of
traffic, so small load issues are quite important. Right now most sites
limit to around 10 rsync connections at once. 

> You need access to the previous _compressed_ version, yes.  So people
> tracking unstable would probably want to do this, while people sticking
> with stable distributions probably wouldn't.  It's a disk space/bandwidth
> tradeoff.

The person generating the .deb would need to use this, not the person
downloading. IMHO it is more feasable to just make rsync aware of how to
decompress a .deb file and then operate on the uncompressed output as
normal - but again the server-side loading starts to get high.
 
> Does it really make sense to use rsync here?  According to this message
> http://samba.anu.edu.au/listproc/rsync/1214.html from the rsync mailing
> list, rsync is optimized for low-bandwidth connections, which these sites
> obviously are not.  Apparently the md5 checksums start to dominate the
> transmission time.

rsync doesn't checksum if the file dates+sizes match. We use rsync
primarily because it supports hardlinks, keeps times in UTC and doesn't
have as bad scalability problems as perl ftp mirror scripts.
 
> Can you say what the advantages of using HTTP rather than the rsync
> protocol would be? 

HTTP is nice because people who are firewalled can still make use of it. 
It is also much easier to convince people to install a new apache module
than it is to install rsyncd :>

That and what I would like to see is quite a bit different from what rsync
provides and has a largeer focus on being low impact on the server.
 
> What's the pseudo-image kit?

Basically what it does it concatinate all the .debs in a debian mirror
together into one gigantic file and then rsyncs a cd image over it. The
rsync algorithm then resorts the 600m file to match the CD image and
then transfers over the ISO metadata. The net result is that you can
transfer a whole 600m Debian image over by downloading about 20meg of
stuff. However - rsync and the PIK nail the drives of both the server and
the client for about 20 mins. A server can support maybe at most two
people doing a PIK rsync at once before the drives melt down :>.

The whole procedure could be optimized substantially by having detached 
block checksums and a smarter PIK.

Jason


Reply to: