Intelligent mirroring
Greetings,
I've had a very hard time getting my mirror to update lately because of
maxed out connections at servers. It is successful about one out of
five times or so, even though the update is done by a cron script at
3:17 AM EDT, and that's to ftp.eecs.umich.edu which seems a lot better
than ftp.debian.org.
Rsync is of course so slow because it must touch every byte of every
file on the client and server. This is important for archives where a
lot of files are changing. But .debs with a given filename never
change, so most of the md4summing disk/CPU activity is totally useless.
Proposed solution: have the update script --exclude *.deb, and while
we're at it, *.orig.tar.gz and *.diff.gz and *.dsc, and then just use
ls-lR to generate a list of such files to delete and a list to wget.
It seems this would make the resulting rsyncs a ton faster (better than
an order of magnitude?), as soon as mirror sites used this new update
script. So a whole lot more mirrors could rsync in a given amount of
time, and wget would be a whole lot more efficient at getting the .debs
etc. than rsync.
Any potential problems with this approach? If not, scripting it will go
on my todo list, and it might happen by the end of the week after
next... (I'm kinda pressed for time these days.)
Please CC me in replies as I'm not subscribed to -devel.
Zeen,
--
-Adam P.
GPG fingerprint: D54D 1AEE B11C CE9B A02B C5DD 526F 01E8 564E E4B6
Welcome to the best software in the world today cafe!
<http://lyre.mit.edu/%7Epowell/The_Best_Stuff_In_The_World_Today_Cafe.ogg>
Reply to: