[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[Debconf-video] Mirrorbrain for debian video archive



Hello debconf-video team,

I'm currently running the experimental debian-cd download redirector
based on mirrorbrain at http://debian-cd.debian.net/, and pabs has asked
me to look into using mirrorbrain for the Debian meetings video archive
too. I did, and this is what I found so far.

For those who do not know what mirrorbrain is: Mirrorbrain is a
download-redirector, whose job is to redirect clients to a suitable
mirror whenever they request a file. To do that, it uses GeoIP for the
clients IP to determine where that client is, and with that information
it locates nearby mirrors. It also scans all the mirrors at regular
intervals, so that it knows exactly which mirror has which files, and
which mirrors are currently down. That way it (almost) never sends a
client to a mirror that doesn't have the particular file or is currently
down.

Now this could obviously be useful for the video archive which delivers
rather large files, where it does make a huge difference whether you
fetch a file from a local mirror or from overseas at 1/10th the speed.
It might even be an idea to send the meetings-archive.d.n (which AFAIK
is where all the official links to the recordings point to) through the
redirector instead of pointing it at the master site in sweden. However,
there are a few problems with the meetings archive as it currently is
that would reduce the effectiveness of the redirector:

1. the redirector needs to know which files are on which mirror, and for
that it needs to scan/index the mirrors. It can do that through rsync,
ftp, or as a last resort, through HTTP. For that to work, the HTTP
directory listings must be in a format parseable by Mirrorbrain. From
the current mirrors, the amazonaws one does not have parseable directory
listings, so cannot be scanned and thus cannot be used by the redirector.
2. Even for the mirrors that can in principle be scanned, there are
directories that cannot be scanned, e.g. /2006/debconf6 [1]. The reason
is that those dirs contains an index.html that displays a nice
explanation of what each file is, and apache delivers that instead of
the usual directory listings. As nice as that is for human users, it
means that mirrorbrain doesn't see a directory listing that it can parse
in that directory, so it will record "this mirror doesn't have any files
in that dir". Requests for files in those dirs will thus always be
redirected to the "fallback" which is the master in sweden. There are
two ways to fix that: Either the mirrors need to provide options to scan
them other than HTTP, e.g. rsync or FTP; Or (suggested by pabs) the
index.html could be renamed to README.html. Apache would still display
this as a footer below the directory listing.
3. Too few mirrors. There are currently only 3 working mirrors and the
main site in sweden that could be fed into mirrorbrain. 3 of those are
in Europe, 1 is in Taiwan. Mirrorbrain unfortunately cannot do miracles.
It would send all requests from Asia to the .tw mirror (if it has the
requested file and is up of course). It would also send all requests
from Europe to an european mirror, and to the one in the same country if
there is one. For all other parts of the world however, including the
US, it would just pick a random mirror in Europe because it doesn't have
anything really suitable.

The mirrorbrain instance for the meetings-archive is currently up at
http://debian-meetings.poempelfox.de/debian-meetings/, but that will not
be a permanent URL. If you decide that this would be useful, I would be
willing to run it for the forseeable future under a debian.net domain
(perhaps even meetings-archive.d.n?), together with the
debian-cd.debian.net. Otherwise it will disappear soon.

[1] http://meetings-archive.debian.net/pub/debian-meetings/2006/debconf6/

Reply to: