[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Archive size explosion.



Hello All

Marcelo E. Magallon (mmagallo@debian.org) has pointed out in a 
recent posting, the rapidly increasing size of the debian archive.  
This is something that I have been concerned about for some time.

The current implementation of the "pool" system has lead to a huge 
growth in the size of the pool directory every time a new 
architecture is released.

The trouble is that there is no easy way for a down stream mirror to 
select just portions of debian such as "stable i386". 

I know that you can use utilities like "debmirror" or mirror scripts 
with regular expressions to exclude the parts that you do not want, 
but it is not a trivial exercise.  (I have first hand experience of this, I 
have kept  a mirror of the "stable i386" and the "soon to be stable 
i386" portions, such as "frozen" and "testing", of debian for over 4 
years. Keeping it valid and up to date has been like chasing a 
moving target, year after year.)

When mirror sites, like large ISP's and educational institutions, who 
are not debian specific are faced with a choice of:

-  mirror everything in debian, 

- setup a special utility to filter out the lesser used parts of the 
mirror, or 

- drop the debian mirror altogether, 

some admins are simply choosing the last option and debian is 
loosing out.  

This is particularly in a problem in places like South Africa, where 
bandwidth is 10 times the price of that in the USA or Europe. As an 
example, the debian mirror on "ftp.linux.org.za" has not been 
updated since 1 April 2001.  This site is hosted on "ftp.is.co.za", 
probably the biggest publicly accessible mirror in this country.

The bigger the archive, the fewer the number of sites that will mirror 
debian.  This in turn means fewer people will get to use the 
distribution.  (Bear in mind that other distributions, like Mandrake, 
can be installed after downloading two iso images of approx 1.2 gB 
and there are lots of sites carrying those two image files.)

I would like to suggest  some changes in the structure of the 
archive.  The simplest way would be to separate the architectures 
into different hierarchies.  Quite frankly, no matter which hardware 
you run, there is absolutely no reason to mirror packages for the 
hardware that you do not support!  

This could be done at domain level so that ftp.debian.org is broken 
down into separate (virtual) sites of the format 

ftp.<architecture>.debian.org.  

It would be much easier to get ftp.i386.debian.org onto a mirror, 
than the whole of ftp.debian.org and we would be in a good position 
to recover some ground in the mirror stakes.  However that would 
mess up the architecture independent "all" hierarchy.  Files in those 
sections would end up being duplicated in each architecture 
section.

A slightly less radical approach would be to split the archive into 

ftp.debian.org:/debian/<architecture>/pool and
ftp.debian.org:/debian/<architecture>/dists

sections.  This together with a 

ftp.debian.org:/debian/dists 

hierarchy with symlinks into the new structures to prevent 
everybody's "sources.list" files from breaking. Then one could then 
mirror the main "dists" section, the "all" sections and the required 
architecture specific ones to get a valid working mirror.

As a bear minimum we should consider splitting the pool directory 
into the format:

pool/<architecture>/l/lib/<package>

so that admins can choose which sections to mirror.

I do know that we need to do something, as the strengths of debian -
its package system, its large number of packages and its large 
number of architectures, is preventing people from using it!

Ian Forbes

---------------------------------------------------------------------
Ian Forbes ZSD
http://www.zsd.co.za
Office: +27 +21 683-1388  Fax: +27 +21 64-1106
Snail Mail: P.O. Box 46827, Glosderry, 7702, South Africa
---------------------------------------------------------------------



Reply to: