Re: ftpsync README rewrite
On Sun, Sep 6, 2009 at 4:39 AM, Simon Paillard
<simon.paillard@resel.enst-bretagne.fr> wrote:
> On Fri, Sep 04, 2009 at 09:09:00PM -0400, Lee Winter wrote:
>> The README file distributed with the ftpsync package needs a rewrite.
>
> Can you share with us the reasons that you consider for a rewrite ?
I can, but not concisely.
Background
I'm new to the debian project. I think it is far more important than
is generally recognized because it represents an interesting new
system for delivering software to users. No other distribution is
close.
One of the keys to the success of that delivery system is the
widespread availability of easy-to-access repositories -- mirrors.
IMO every installation with more than a few machines should have a
local one. Caches for apt are useful, but for small and medium groups
of users such tools have the wrong bandwidth consumption behavior.
I.e., if you have not already gotten a package then you end up getting
it at an inopportune time of day and with much higher latency (between
the request and the availability of the package requested). Whereas
100Mbs networks are trivial these days, and disk space into the 100 GB
range is essentially free, external bandwidth is still precious.
Foreground
I've experimented with the available mirror-creation software and
found weaknesses
in either the creation or the necessary maintenance. Apt-mirror came
closest, but it does not appear to be under development, the developer
has not responsed to email, and some of the weaknessess are
deal-breakers.
Until recently debmirror was plagued with many of the same weaknesses
as apt-mirror plus a few of its own. So I looked at ftpsync. The
debian web pages and some of the deflivered files all point to the
README. But it s essentially useless. Just what is a typical mirror
and what makes it typical? Etc. ad nauseum.
A reader of the README who is already familiar with the scripts
probably has no trouble verifying the proper interpretation of the
contents of the README. But for a reader who is not already familiar
with the scripts this is the kind of documentation/comments that
experienced software maintainers have learned to ignore as distracting
and/or misleading.
However, shortly after our initial exchange over ftpsync debmirrir
sustained some encouraging activity. It now appears to be usable (I
am still exercising it to confirm that appearance).
So in the mean time I have accomplished two things. I analyzed the
debian subsections with an eye to writing up a tuning article on
mirror admin. E.g., if your uplink is a 56Kbps link with a 30% duty
cycle and you live under a triple canopy forest teaching elementary
school you may not have much use for the devel or electronics
subsections. But there is not enough information available about
what including or excluding a particular subsection costs in space and
bandwidth. For 5.03 the space numbers are as follows (a/o
2009-09-12):
K bytes Package
------------------------------
0,037,336 <empty>
0,356,864 admin
0,000,000 cli-mono
0,061,500 comm
0,000,000 database
0,039,188 debian-installer
2,220,512 devel
3,059,488 doc
0,388,384 editors
0,079,840 electronics
0,002,296 electronics
0,000,000 fonts
4,394,412 games
0,418,588 gnome
0,000,000 gnu-r
0,000,000 gnustep
0,597,676 graphics
0,018,760 hamradio
0,000,000 haskell
0,000,000 httpd
0,286,396 interpreters
0,000,000 java
0,851,340 kde
0,000,000 kernel
1,905,504 libdevel
1,323,708 libs
0,000,000 lisp
0,000,000 localization
0,198,136 mail
0,462,664 math
0,332,924 misc
0,504,872 net
0,016,100 news
0,000,000 ocaml
0,009,612 oldlibs
0,060,892 otherosfs
0,122,440 perl
0,000,000 php
0,450,752 python
0,000,000 ruby
0,378,792 science
0,015,704 shells
0,447,684 sound
0,512,224 tex
0,540,220 text
0,292,048 utils
0,000,000 vcs
0,000,000 video
0,000,000 virtual
0,393,776 web
0,691,140 x11
0,000,000 xfce
0,000,000 zope
21,471,772 all
Now I am certain that there is an easier way to compute these values
using package tools rather than incremental mirror updates. But I
didn't find any such tools. IMO this information should be available
to guide mirror admins. But before this info is assembed and
presented perhaps it would be useful to update the debian mirrors web
page that lists only the old subsections. AFAICT the only source of
info on the new subsections is an email from jan or feb of this year.
In addition to the size of sections mirror admins need to know about
expected volatility (Mb/day/subsection) because in the dial up world
update/maintenance bandwidth trumps creation bandwidth completely
(e.g., buy the CD/DVDs but then you have to download updates).
However, I see no simple way to compute that over a reasonable sample
period, like a month, without a serious
investment in package tools. Do they aready exist?
The other thing I accomplished is to find a way to eliminate the load
that rsync imposes on upstream mirrors. IMO it would not take much
work to tweak rsync and adjust the calling scripts so that the
upstream mirrors could be completely passive.
Given the recent activity in debmirror and the possibility of an
improvement in rsync the doc for ftpsync is going to have to wait.
FWIW,
Lee Winter
NP Engineeering
Nashua, New Hampshire
Reply to: