[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ftpsync README rewrite



On Sun, Sep 6, 2009 at 4:39 AM, Simon Paillard
<simon.paillard@resel.enst-bretagne.fr> wrote:
> On Fri, Sep 04, 2009 at 09:09:00PM -0400, Lee Winter wrote:
>> The README file distributed with the ftpsync package needs a rewrite.
>
> Can you share with us the reasons that you consider for a rewrite ?

I can, but not concisely.

Background

I'm new to the debian project.  I think it is far more important than
is generally recognized because ot represents an interesting new
system for delivering software to users. No other distribution is
close.

One of the keys to the success of that delivery system is the
widespread availability of easy-to-access repositories -- mirrors.
IMO every installation with more than a few machines should have a
local one.  Caches for apt are useful, but for small and medium groups
of users such tools have the wrong bandwidth consumption behavior.
I.e., if you have not already gotten a package then you end up getting
it at an inopportune time of day and with much higher latency (between
the request and the availability of the package requested).  Whereas
100Mbs networks are trivial these days, and disk space into the 100 GB
range is essentially free, external bandwidth is still precious.

I've experimented with mirror-creation software and found weaknesses
in either the creation or the maintenance necessary.  Apt-mirror came
closest, but it does not appear to be under development, the developer
has not responsed to email, and some of the weaknessess are
deal-breakers.

Until recently debmirror was plagued with many of the same weaknesses
as apt-mirror plus a few of its own.  So I looked at ftpsync.  The
debian web pages and some of the deflivered files all point to the
README.  But it s essentially useless.  Just what is a typical mirror
and what makes it typical, etc. ad nauseum.

A reader of the README who is already familiar with the scripts
probably has no trouble verifying the proper interpretation of the
contents of the README.  But for a a read who is not already familir
with the scripts this is the kind of documentation/comments that
experienced software maintainers have learned to ignore as distracting
or misleading.

However, shortly after our initial exchange over ftpsync debmirrir
sustained some encouraging activity.  It now appears to be usable (I
am still exercising it to confirm that appearance).

So in the mean time I have accdomplished two things.  I analyzed the
debian subsections with an eye to writing up a tuning article on
mirror admin.  E.g., if your uplink is a 56Kbps link with a 30% duty
cycle and you live under a triple canopy forest teaching elementary
school you may not have much use for the devel or electronics
subsections.    But there is not enough information available about
what including or excluding a particular subsection costs in space and
bandwidth.  For 5,03 the space numbers are as follows (a/o
2009-09-12):

K bytes      Package
------------------------------
0,037,336   <empty>
0,356,864    admin
0,000,000    cli-mono
0,061,500    comm
0,000,000    database
0,039,188    debian-installer
2,220,512    devel
3,059,488    doc
0,388,384    editors
0,079,840    electronics
0,002,296    electronics
0,000,000    fonts
games	4,394,412
gnome	418,588
gnu-r	0
gnustep	0
graphics	597,676
haskell	0
hamradio	18,760
httpd	0
interpreters	286,396
java	0
kde	851,340
kernel	0
libdevel	1,905,504
libs	1,323,708
lisp	0
localization	0
mail	198,136
math	462,664
misc	332,924
net	504,872
news	16,100
ocaml	0
oldlibs	9,612
otherosfs	60,892
perl	122,440
php	0
python	450,752
ruby	0
science	378,792
shells	15,704
sound	447,684
tex	512,224
text	540,220
utils	292,048
vcs	0
video	0
virtual	0
web	393,776
x11	691,140
xfce	0
zope	0
	
	21,471,772

Now I am certain that there is an easier way to compute these values
using package tools rather than incremental mirror updates.  But I
didn;t find any such tools.  IMO this information should be available
to guide mirror admins.  But before this info is assembed and
presented perhaps it would be useful to update the debian mirros web
page that lists only the old subsections.  AFAICT the only source of
info on the new subsections is an email from jan or feb of this year.

In addition to the size of sections mirror admins need to know about
expected volatility (Mb/day/subsection) because in the dial up world
update/maintenance bandwidth trumps creation bandwodth completely.
But I see no simple way to compute that over a reasonable sample
period, like a month, without a serious investment in package tools.
Do they aready exist?



  However, recent activity has been very encouraging.



>
>> I intend to produce a draft.  With whom should I correspond to obtain
>> the answers to the questions that the rewrite needs to address?
>
> You can find the git repo at
> https://ftp-master.debian.org/git/archvsync.git
>
> debian-mirrors is a approriate place for questions about ftpsync (after
> reading ftpsync script of course :-).
>
> Regards.
>
> --
> Simon Paillard
>


Reply to: