Archive size explosion.
Hello All
Marcelo E. Magallon (mmagallo@debian.org) has pointed out in a
recent posting, the rapidly increasing size of the debian archive.
This is something that I have been concerned about for some time.
The current implementation of the "pool" system has lead to a huge
growth in the size of the pool directory every time a new
architecture is released.
The trouble is that there is no easy way for a down stream mirror to
select just portions of debian such as "stable i386".
I know that you can use utilities like "debmirror" or mirror scripts
with regular expressions to exclude the parts that you do not want,
but it is not a trivial exercise. (I have first hand experience of this, I
have kept a mirror of the "stable i386" and the "soon to be stable
i386" portions, such as "frozen" and "testing", of debian for over 4
years. Keeping it valid and up to date has been like chasing a
moving target, year after year.)
When mirror sites, like large ISP's and educational institutions, who
are not debian specific are faced with a choice of:
- mirror everything in debian,
- setup a special utility to filter out the lesser used parts of the
mirror, or
- drop the debian mirror altogether,
some admins are simply choosing the last option and debian is
loosing out.
This is particularly in a problem in places like South Africa, where
bandwidth is 10 times the price of that in the USA or Europe. As an
example, the debian mirror on "ftp.linux.org.za" has not been
updated since 1 April 2001. This site is hosted on "ftp.is.co.za",
probably the biggest publicly accessible mirror in this country.
The bigger the archive, the fewer the number of sites that will mirror
debian. This in turn means fewer people will get to use the
distribution. (Bear in mind that other distributions, like Mandrake,
can be installed after downloading two iso images of approx 1.2 gB
and there are lots of sites carrying those two image files.)
I would like to suggest some changes in the structure of the
archive. The simplest way would be to separate the architectures
into different hierarchies. Quite frankly, no matter which hardware
you run, there is absolutely no reason to mirror packages for the
hardware that you do not support!
This could be done at domain level so that ftp.debian.org is broken
down into separate (virtual) sites of the format
ftp.<architecture>.debian.org.
It would be much easier to get ftp.i386.debian.org onto a mirror,
than the whole of ftp.debian.org and we would be in a good position
to recover some ground in the mirror stakes. However that would
mess up the architecture independent "all" hierarchy. Files in those
sections would end up being duplicated in each architecture
section.
A slightly less radical approach would be to split the archive into
ftp.debian.org:/debian/<architecture>/pool and
ftp.debian.org:/debian/<architecture>/dists
sections. This together with a
ftp.debian.org:/debian/dists
hierarchy with symlinks into the new structures to prevent
everybody's "sources.list" files from breaking. Then one could then
mirror the main "dists" section, the "all" sections and the required
architecture specific ones to get a valid working mirror.
As a bear minimum we should consider splitting the pool directory
into the format:
pool/<architecture>/l/lib/<package>
so that admins can choose which sections to mirror.
I do know that we need to do something, as the strengths of debian -
its package system, its large number of packages and its large
number of architectures, is preventing people from using it!
Ian Forbes
---------------------------------------------------------------------
Ian Forbes ZSD
http://www.zsd.co.za
Office: +27 +21 683-1388 Fax: +27 +21 64-1106
Snail Mail: P.O. Box 46827, Glosderry, 7702, South Africa
---------------------------------------------------------------------
Reply to: