[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ftpsync README rewrite



On Sun, Sep 6, 2009 at 4:39 AM, Simon Paillard
<simon.paillard@resel.enst-bretagne.fr> wrote:
> On Fri, Sep 04, 2009 at 09:09:00PM -0400, Lee Winter wrote:
>> The README file distributed with the ftpsync package needs a rewrite.
>
> Can you share with us the reasons that you consider for a rewrite ?

I can, but not concisely.

Background

I'm new to the debian project.  I think it is far more important than
is generally recognized because it represents an interesting new
system for delivering software to users. No other distribution is
close.

One of the keys to the success of that delivery system is the
widespread availability of easy-to-access repositories -- mirrors.
IMO every installation with more than a few machines should have a
local one.  Caches for apt are useful, but for small and medium groups
of users such tools have the wrong bandwidth consumption behavior.
I.e., if you have not already gotten a package then you end up getting
it at an inopportune time of day and with much higher latency (between
the request and the availability of the package requested).  Whereas
100Mbs networks are trivial these days, and disk space into the 100 GB
range is essentially free, external bandwidth is still precious.

Foreground

I've experimented with the available mirror-creation software and
found weaknesses
in either the creation or the necessary maintenance.  Apt-mirror came
closest, but it does not appear to be under development, the developer
has not responsed to email, and some of the weaknessess are
deal-breakers.

Until recently debmirror was plagued with many of the same weaknesses
as apt-mirror plus a few of its own.  So I looked at ftpsync.  The
debian web pages and some of the deflivered files all point to the
README.  But it s essentially useless.  Just what is a typical mirror
and what makes it typical?  Etc. ad nauseum.

A reader of the README who is already familiar with the scripts
probably has no trouble verifying the proper interpretation of the
contents of the README.  But for a reader who is not already familiar
with the scripts this is the kind of documentation/comments that
experienced software maintainers have learned to ignore as distracting
and/or misleading.

However, shortly after our initial exchange over ftpsync debmirrir
sustained some encouraging activity.  It now appears to be usable (I
am still exercising it to confirm that appearance).

So in the mean time I have accomplished two things.  I analyzed the
debian subsections with an eye to writing up a tuning article on
mirror admin.  E.g., if your uplink is a 56Kbps link with a 30% duty
cycle and you live under a triple canopy forest teaching elementary
school you may not have much use for the devel or electronics
subsections.    But there is not enough information available about
what including or excluding a particular subsection costs in space and
bandwidth.  For 5.03 the space numbers are as follows (a/o
2009-09-12):

K bytes      Package
------------------------------
0,037,336   <empty>
0,356,864    admin
0,000,000    cli-mono
0,061,500    comm
0,000,000    database
0,039,188    debian-installer
2,220,512    devel
3,059,488    doc
0,388,384    editors
0,079,840    electronics
0,002,296    electronics
0,000,000    fonts
4,394,412    games
0,418,588    gnome
0,000,000    gnu-r
0,000,000    gnustep
0,597,676    graphics
0,018,760    hamradio
0,000,000    haskell
0,000,000    httpd
0,286,396    interpreters
0,000,000    java
0,851,340    kde
0,000,000    kernel
1,905,504    libdevel
1,323,708    libs
0,000,000    lisp
0,000,000    localization
0,198,136    mail
0,462,664    math
0,332,924    misc
0,504,872    net
0,016,100    news
0,000,000    ocaml
0,009,612    oldlibs
0,060,892    otherosfs
0,122,440    perl
0,000,000    php
0,450,752    python
0,000,000    ruby
0,378,792    science
0,015,704    shells
0,447,684    sound
0,512,224    tex
0,540,220    text
0,292,048    utils
0,000,000    vcs
0,000,000    video
0,000,000    virtual
0,393,776    web
0,691,140    x11
0,000,000    xfce
0,000,000    zope

21,471,772    all

Now I am certain that there is an easier way to compute these values
using package tools rather than incremental mirror updates.  But I
didn't find any such tools.  IMO this information should be available
to guide mirror admins.  But before this info is assembed and
presented perhaps it would be useful to update the debian mirrors web
page that lists only the old subsections.  AFAICT the only source of
info on the new subsections is an email from jan or feb of this year.

In addition to the size of sections mirror admins need to know about
expected volatility (Mb/day/subsection) because in the dial up world
update/maintenance bandwidth trumps creation bandwidth completely
(e.g., buy the CD/DVDs but then you have to download updates).
However, I see no simple way to compute that over a reasonable sample
period, like a month, without a serious
investment in package tools.  Do they aready exist?

The other thing I accomplished is to find a way to eliminate the load
that rsync imposes on upstream mirrors.  IMO it would not take much
work to tweak rsync and adjust the calling scripts so that the
upstream mirrors could be completely passive.

Given the recent activity in debmirror and the possibility of an
improvement in rsync the doc for ftpsync is going to have to wait.

FWIW,

Lee Winter
NP Engineeering
Nashua, New Hampshire


Reply to: