[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Intelligent mirroring



Greetings,

I've had a very hard time getting my mirror to update lately because of maxed out connections at servers. It is successful about one out of five times or so, even though the update is done by a cron script at 3:17 AM EDT, and that's to ftp.eecs.umich.edu which seems a lot better than ftp.debian.org.

Rsync is of course so slow because it must touch every byte of every file on the client and server. This is important for archives where a lot of files are changing. But .debs with a given filename never change, so most of the md4summing disk/CPU activity is totally useless.

Proposed solution: have the update script --exclude *.deb, and while we're at it, *.orig.tar.gz and *.diff.gz and *.dsc, and then just use ls-lR to generate a list of such files to delete and a list to wget.

It seems this would make the resulting rsyncs a ton faster (better than an order of magnitude?), as soon as mirror sites used this new update script. So a whole lot more mirrors could rsync in a given amount of time, and wget would be a whole lot more efficient at getting the .debs etc. than rsync.

Any potential problems with this approach? If not, scripting it will go on my todo list, and it might happen by the end of the week after next... (I'm kinda pressed for time these days.)

Please CC me in replies as I'm not subscribed to -devel.

Zeen,
--

-Adam P.

GPG fingerprint: D54D 1AEE B11C CE9B A02B  C5DD 526F 01E8 564E E4B6

Welcome to the best software in the world today cafe! <http://lyre.mit.edu/%7Epowell/The_Best_Stuff_In_The_World_Today_Cafe.ogg>




Reply to: