[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Mirror split stuff



Hey all,

First, the executive summary for mirror operators reading this: we'll be
switching the primary mirror stuff for Debian to be for a small number
of architectures rather than all of them; initially this will just be
i386, but will probably expand to include amd64. Single architecture
mirrors appear to need about 60GB of disk space, dual architecture
mirrors will need about 80GB of disk space. A full mirror at the moment
requires slightly over 170GB. We'll be trying to get in touch with you
all further over the next week or two to make sure the changes go as
smoothly as possible.

Second, there's a call for help down below. Particularly for people with
some web programming skillz.

Okay, so!

You've probably heard about this mirror split thing, also known as "scc",
"second class citizens", and "the evil plot by the cabal to make poor
blighted amd64 developers and users suffer unduly".

At present, most mirrors mirror all the archive, and at present almost all
Debian users that use those mirrors use i386 [0]. But with the archive
growing almost daily (recently breaking the 170GB mark, up from 130GB
in July iirc), and mirrors often finding it hard to keep up, that's a
bad tradeoff. That tradeoff becomes even worse when we're unwilling to
add interesting new architectures because of the immediate increase in
archive size they cause, in addition to the ongoing accretion.

So obviously that tradeoff's changing in favour of partial mirrors,
particularly by architecture.

People have been able to do partial mirrors for a while now, and the
anonftpsync [1] tool we offer includes explicit support for that. In
further support of partial mirrors, we'll be doing these things:

	(a) allowing and in some cases encouraging official mirrors to 
            mirror a limited number of archictures

	(b) limiting ftp.debian.org to i386 only (probably to also
	    include amd64, depending on how popular that architecture
	    actually is)

	(c) providing simple recommendations on running arch-specific
	    mirrors

	(d) providing <cc>.<arch>.mirror.debian.org aliases to make it
	    easy for users to find a local mirror that supports their
	    preferred architecture

As well as allowing additional architectures, these changes should
make it plausible to think about a few other things, notably including
additional suites in the archive (such as "volatile" or "backports"),
and having the archive pulse occur more frequently than daily.

But one of the things all this stuff requires is some good communication
with our mirrors. That can use some work at the moment, and one thing
that would be particularly helpful is some better tracking of who's
mirroring what. At present we have the "Mirrors masterlist" [2], the
mirrors pseudopackage on the BTS [3] (and its corresponding mailing list,
which has public archives on master [4]) and the Debian mirror checker
[5].

What would be helpful would to improve our tracking stuff so that:

	(a) the mirror checker can provide more detailed information on
	    mirror stability such as that offered by the apache tracker [6]

	(b) the mirror checker script can verify which architectures a
	    mirror actually carries

	(c) there's some easy way for mirror admins to add, remove and
	    update their details in the masterlist, rather than waiting
	    for a developer to review the changes (with additions
	    confirmed automatically by the checker, ideally)

	(d) there are status fields to indicate whether the mirror supports
	    pushing downstream mirrors by ssh trigger, in the same manner as
	    the top level Debian mirrors

	(e) what mirror each mirror mirrors from (ideally checked
	    automatically by looking in project/trace; and ideally with
	    a pretty graph generated from the data, highlighting mirrors
	    that are out of date)

	(f) whether the mirror is able to accept *.mirror.debian.org
	    requests, so that if, eg, "at.alpha.mirror.debian.org"
	    goes down, it can automatically be pointed at another site
	    in Austria, or if none are available, another nearby alpha
	    mirror.

But that's not really something I'm good at, and the existing folks working
on organising the mirrors haven't had time to magic it up either, so if other
folks would like to try their hand at whipping something up that can be
made official that'd be great.

Having it be something that can be run with minimal priveleges, not
require PHP or a large framework, something that supports the existing
Mirrors masterlist format for input and output, and be something that
is written to run efficiently would be great. Naturally it needs to be
free software. Followups on #debian-mirrors on OFTC, or to -devel I guess.

This isn't a showstopper, but I think it's fair to say everyone will
benefit from us doing a better job here than we currently are, so
additional help would be _really_ useful.

What else?

As of today, there are some file lists in /debian/indices/files that
you can use to help decide what stuff to download. In particular, you can
download an i386 only mirror by simply invoking rsync as:

    rsync -av --progress --delay-updates -a 
        --files-from :indices/files/arch-i386.files \
        --delete --delete-after --max-delete=1000 \
        rsync://MIRROR/debian/ ./

Likewise for other architectures. Note that the hurd-i386 files list
currently includes source and arch:all packages from oldstable, stable
and testing as well as unstable, which is a bug.

Note that this will mean that if you're running a non-i386 architecture,
you may find your preferred mirror will stop working when all this
actually happens *including for stable users*. There'll be another
announcement before this actually happens.

On a related note, we may be moving oldstable onto archive.debian.org
before ceasing security support, either for woody or shortly after
future releases (possibly on the order of a month after). Note that
point releases of oldstable are not undertaken, and security updates
are thus not included directly into oldstable -- at present it's mostly
maintained to ensure security buildds can download build-depends, which
can happen equally well via archive.debian.org. [7]

Note that the above isn't final, but it's hopefully pretty close to
being so.

I hope that's all for now. :)

Cheers,
aj

[0] The stats for ftp.debian.org over http is about 96.4% of users
    using i386 exclusively, 1.4% wanting powerpc, 0.5% wanting sparc,
    and the remaining architectures getting about .2% each -- stats are
    by IP looking at queries for Packages files, and calculated over a
    week with around 144,000 unique IPs seen in that time. Beyond the
    96.4% wanting i386 exclusively, and additional 0.8% wanted i386 and
    some other architecture/s.

[1] See http://www.debian.org/mirror/ftpmirror

[2] http://cvs.debian.org/*checkout*/webwml/english/mirror/Mirrors.masterlist?root=webwml

[3] http://bugs.debian.org/

[4] master:~debian/archive/debian-mirrors/

[5] http://mirror.debian.org/status.html (official)
    http://www.de.debian.org/dmc/today/ (later version of the same codebase)

[6] http://www.apache.org/mirrors/

[7] On the same basis as [0], currently about 6.9% of ftp.debian.org users
    look for oldstable. Of those, all but 6.85% appear to do so with apt.

Attachment: signature.asc
Description: Digital signature


Reply to: