Re: Anyone want a utility to find the best mirror?
> On Thu, 28 Sep 2000, Joe Emenaker wrote:
> > o It only tests ping times, not actual transfer rates of data
> Right, because ping times are a good first step. If the box pings really
> slowly, the odds are the transfer rate will be greatly affected.
> I don't care if the lines you and the mirror are on are T1s, if a critical
> link in the middle is causing slow pings, the odds are the transfer will
> be slow.
True. But the odds aren't always right. I once saw an article in SysAdmin
where the author noted that, since pings are icmp echo requests, it's a
really low-level request and the author put forth the assertion that the TCP
layer of the pinged host isn't even involved. Ultimately, the author argued
that there are cases where the machine can be pretty much hung yet the
network card can be responding to pings.
And, more apropos to this particular discussion, I've seen several cases
where the transfer rates from quick-pinging hosts were pretty bad and vice
versa. One way you can get results like this is to compare, say, two
machines that are separated by a single ISDN connection, and two hosts that
are separated by 50 T1's (in series, of course). The ISDN link is going to
have pretty good ping times because a ping packet is so darned small and
they'll get over the ISDN link in no time. However, even though the T1's are
15 times faster, there are 50 routers that need to figure out what the hell
to do with that packet.
Now, this latency problem with 50 routers would cripple your transfer of
larger files if you didn't have a receive window. But we do have receive
windows, so there's a chance of the latency only creeping in to affect the
transfer at the very beginning. Kinda like the sun. When the sun "turned on"
at the dawn of time, it took 7 minutes for that first ray of sunshine to hit
us... but we've been receiving light from it at the speed of light ever
> > These last two are a real pain... especially the last one. Many of the
> > I tried first in the mirrrors list didn't have the debian tree anymore.
> > Hrmph!
> So the list is out of date. That isn't netselect's fault.
I never said it was. My point was that netselect only goes about 25% of the
way that most of us would like to go. That's not to say that netselect
sucks. It's not even to say that netselect isn't among the tools for the
job. I guess my point is there needs to be something more.... whether or not
netselect is employed as well.
> Can I suggest you focus on that side of the problem....
By cleaning up the mirrors list? Well, I kinda thought of that... where the
script would maintain a history of the sites... and it would just write off
the ones that had been inaccessible for over, say, two weeks. What would be
really neat is to have this running on the central Debian server and then
have it publish "today's list of working mirrors"... instead of it being
such a static list.
Still, that doesn't really address the problem of finding the fastest mirror
for your place on the net. What I'd ultimately like to do is have it
(optionally, of course) automatically update /var/lib/dpkg/methods/ftp/vars
to point to the best server(s) that it finds.
> > So, anyway.... I broke down and wrote a little perl script....
> This is a nice idea. You are doing both http _and_ ftp, right?
Not yet. Do you have any numbers on how many people actually use http for
dselect? Strangely, I've almost always used FTP for transferring files and
HTTP for transferring hypertext. Call me silly....
> > and then it actually tries to download a
> > file from the site to see what kind of transfer rate it gets.
> I suspect the mirrors admins would prefer this:
> Use netselect to narrow down to 5 or sites from a list (or even 2-3?) and
> _then_ do a transfer rate test. I guarantee you: the results will be the
> same, and more so, you will have skipped transfering data from dozens or
> more other sites.
There's a thought. It would save me from having to make my utility
parallelizing, lest it take a hour to run.
However, keep in mind that the file I'm grabbing from each one is the
mirrors list itself... about 20k or so.... so it's not like I'm killing the
server. However, the problem with small files is kinda the same problem with
ping packets... where you're not leveraging the benefit of receive windows
to test the steady-state throughput betweent the two hosts.
> > It does *not* check for *currency* of the mirror. In other words, if the
> > files in the mirror are all 6 months old, the script will not pick that
> > up.... yet. I do plan, however, to make it sense that and take that into
> > account.
> Actually, I think the answer to this should be done on a server level: the
> mirror _should_ have a timestamp of last update. This would be all that
> is needed. If a site stopped mirroring, the timestamp would be way off.
Is there a file in the tree that's always being regenerated?
> > Anyway, if anybody wants to give it a whirl, I'm open to some
> > beta-testing.....
> I'd be interesting in seeing it.... but perl scripts that use netselect
> to do the pinging make more sense.
I'll take a look at netselect and see if I can get it to give me the top X
in the list. Last time I ran it, it just kinda sat there for a minute and
then it just blurted out one host name.
Another issue is if you take, say, the top 5 hosts from netselect and they
all turn out to be: a) no longer running ftp, b) no longer accepting
anon-ftp, c) no longer mirroring debian, then you have to get the next
5...etc. I think the best way might be to have netselect report on all of
the machines and then just have my script go down the list however far it
has to until it has successfully found X suitable debian mirrors.