[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Emdebian] Mirrors



On Thu, 17 Sep 2009 22:49:07 +0200
Hector Oron <hector.oron@gmail.com> wrote:

Sending this to the list as the issues are now quite general.

The background is issues relating to how Grip builds packages, how long
it all takes and how to mirror Grip. The issues are mostly to do with
available space on the machines running emdebian-grip-server and the
amount of workload that emdebian-grip-server now appears to be
demanding. ('ant' refers to the www.emdebian.org server.)

If particular issues "hit a nerve", please file bugs against the
emdebian-grip-server package in Debian.

> >> rsync was already running, so it was locked.
> >
> > rsynd is running, rsync is not.
> 
> Sure. I think you misunderstood me, ./signal is a <1sec task which
> just tells ftp.tw.debian.org to update the repository using rsync
> (above ftp.tw.debian.org was already running rsync that is why it says
> it is locked, but this as the test we did to proof that push_mirror
> was working.

OK but the signal should probably not happen every 2 seconds for 7
hours per day. The Grip process is not staged (it would take up even
more temporary space), the .deb is put into reprepro as soon as it is
built, the temporary files are all deleted, then we go on to build
the next .deb for the same package of the next architecture etc.

i.e. we don't have incoming.emdebian.org because we don't have that
much temporary space available. 'Emdebian Grip unstable' changes many
hundreds of times per day. This is probably not ideal but we'd need
another 30Gb of temporary space to have a dedicated
incoming.emdebian.org area. 'Emdebian Grip testing' changes a few dozen
times a day, depending on how many packages are migrating and how many
new packages need to be gripped. This is particularly noticeable during
the evenings because grip-cron has to do the migrations only after
unstable has finished updating and so testing is being updated just
when most people want to do 'apt-get update', resulting in apt
reporting that certain combinations cannot be installed and then
becoming available a little later on.

We are working reprepro quite hard right now, causing a lot of churn
in the Packages files and reprepro databases and that churn takes
longer the more packages get involved. With a lot more temporary space,
we could make things a lot more like standard Debian with twice daily
runs to move packages from incoming into the real pool. (Our incoming
area would be subdivided according to each component and each suite we
support so that reprepro could be given a path to all .debs to put into
the same suite and the same component in one operation. This means that
the Packages file changes only once per push as each component and each
suite have their own Packages files.)

IIRC this requires someone providing a new hard drive to go into ant -
check with Simon. If we do that, I'd suggest adding a lot more than
just 30Gb as there seems to be little point searching for a new drive
that small - may as well go for 120Gb or more.

Having said that, using an incoming area and supporting that in
emdebian-grip-server also means that any other machine running
emdebian-grip-server needs to have similar amounts of temporary space
or introduce some form of configuration (debconf question) option that
retains current behaviour or uses an incoming path. (So even once the
new space is available, the use of that space requires a new release of
emdebian-grip-server.)

If this is what users want, please file a bug report to ensure it
happens.

> >> Do you mind to run (with user emdebian)
> >> the /home/emdebian/bin/signal script everytime you add new
> >> packages to the repository?

This would only work with Crush which builds one entire set of packages
for one source at a time, puts them all into the repo in one lump and
then waits for the autobuilder to finish the next entire batch of
builds before uploading the next lot. Grip doesn't work that way, it is
incremental and continuous. Each actual grip process can take such a
short space of time that there would (usually) not be time for the
rsync process to start before the repo changed again.

Unfortunately, this even happens with testing which is quite unusual
for how Debian normally works. The majority of migrations into testing
happen instantaneously with reprepro but the repo is still growing and
each time a new package is added (or each time testing-proposed-updates
is used once Debian gets into the Squeeze release freeze), if there is
a different version in testing, that version has to be gripped afresh.

We're still adding packages quite often - mostly missing dependencies
of existing packages where a new version adds another dependency.

One issue right now is that we aren't handling removals *AT ALL*. I
haven't got code to handle what ftp.debian.org does manually (and
because ftpmasters do do this manually, there probably isn't code to do
it automatically). I don't think this is a particular problem -
removing a package from unstable and/or testing doesn't affect the size
of the archive pool/ (because we have a stable release that retains it
and we'll have an oldstable after that) and as the package has been
removed from Debian unstable and testing, we aren't going to need to
Grip the package again so it doesn't cost us runtime either. However,
I'm sure this will bite us eventually and I'm not sure how to fix it
other than to trust most of this to reprepro so that when oldstable is
removed prior to receiving packages from stable during a release, the
old versions will be removed from pool/. Having the package listed in
unstable when Debian does not does bloat our Packages file (which isn't
good) and possibly complicates dependency resolution on systems running
Grip (because Debian does not expect libfoo0 to still be around).

If we had an incoming area, we could use triggers as you initially
requested but removals would still not be handled explicitly.

> > Not practical. With Grip, the packages are added by the
> > emdebian-grip-server package. The only real way is to do the call at
> > the end. Right now, grip-cron.sh appears to need more than several
> > hours to run - about 7-10hrs at the moment. I think it had
> > something to do with the outage because it usually takes quite a
> > bit less and the scripts are still trying to catch up with
> > unstable. It could also be simply dealing with much, much larger
> > packages which take a lot of time to Grip. I'm hoping to take a
> > look at what is happening tomorrow but the time required means that
> > I can't actually do anything with the repository in the evenings as
> > reprepro is almost constantly locked (for additions/updates).
> 
> About your lags, i have been building toolchains on ant, that could
> explain your timings.
> According to Simon, the machine was down because KVM did not attached
> to the right network interface, but else it is posible too.

I'll check on that later. I'm certainly hoping that the combination of
other loads and the extra package workload due to the outage is the
reason why the times have grown so far. It does show that Grip has
technical limits to the number of packages per archive (i.e. per
machine). It becomes necessary to have multiple archives (on multiple
machines) - one for the base packages and others (which do not have to
use mirroring from base) to add alternative optional sets of packages.
The reality is that any one machine can only cope with so many Grip
packages (or so many Grip architectures). Debian has at least one
buildd per architecture, we have one buildd for seven architectures -
we're going to need several partial buildd's to complete the package
set. We don't need mirrors for Grip, we need partial builders that
augment the packages available from the base machine.

One option is to have a "behind-the-scenes" machine with the same
internet connection as the "frontend" but which does the grunt work of
processing the packages and then all the frontend (ant) needs to do is
sync the mirror twice a day. This would prevent things like toolchains,
apache and other tasks prolonging the build process.

Another option would be to separate out the architectures but this has
less direct support in the scripts - where to put the Arch:all
packages for one thing.

> I am not aware (yet) how emdebian-grip-server and grip-cron.sh works,

1. use normal reprepro methods to update the local filter repo which
saves time in the later stages but does take up as much space again.
2. identify the relevant packages from the Packages file of the filter
repo. (these stages take a very short time but stage 1 is very network
intensive, stage 2 is very CPU intensive.)
3. Iterate through the list of packages, passing each .deb and .dsc
through the grip processing. Each run doesn't take that much time but
the .deb has to be unpacked and repacked - with very large packages
(java and gcc), this can take a noticeable period of time. The problem
is that as we are not compiling the package from source, (where you
only unpack the source once), we unpack each compiled .deb for each
architecture, process it and then repack it. If the source builds 30
architecture-dependent binaries and Grip includes 12 of those binary
packages, we unpack and repack 84 .debs. (7 architectures).

grip-cron wraps this process to first handle unstable, then migrate
packages into testing. A separate run (which hasn't even been running
recently) then handles updates to stable-proposed-updates. Where the
package to be migrated already exists, it is copied using reprepro
which is trivial. When a specific version has been uploaded for that
suite (typically stable-proposed-updates but testing can have dedicated
queues too), the grip process needs to be run against that version of
the package.

Each version of each architecture of each package is only "gripped"
once but there are a few corner cases to do with translations and Arch:
all packages. These do result in some duplication of effort but I
haven't found a satisfactory way to handle those yet. It is possible
that Emdebian decides *not* to bother about the endianness of .mo files
- we need some data on whether the current setup does give any
performance gain (especially when loading the GUI in a non-English
locale that is well supported in the translations) and then debate
whether that gain is sufficient reason to keep the Emdebian TDebs are
architecture-specific. Changing to Arch:all TDebs is a lot of work but
would save a very large amount of space and cut out a signficant chunk
of time from the Grip processing. The implications for Crush also need
to be considered - indeed the measurement of the performance gain
should be done using Crush because Grip only uses this method because
it was deemed appropriate for Crush to have architecture-specific TDebs.

The original idea was that by the time we had to decide this, Debian
would have Arch:all TDebs that could simply be put directly into Grip.

There are issues here that need wider discussion on the mailing list.

> but the ./signal it is just a trigger.
> Anyone from emdebian server can talk to ftp.tw.debian.org and trigger
> the update by running such script in not much time, then the work is
> done by tw machine. :-)

I'm just expecting the trigger to be run less times per day. The
current setup means that before rsync has worked out which files have
been updated, the next package has been built and reprepro is busy
"Exporting Indices...".

> If you still think this model does not fit, i'll just set a cron task
> which triggers the update.

I think a once-daily cron task is going to be necessary, synchronised
to run after the grip-cron task on ant (which currently starts at 3pm
UTC).

One way is to wrap grip-cron itself with a new script calls grip-cron.
When grip-cron finally exits the wrapper can call a trigger. The
trigger cannot be executed before grip-cron as the mirroring that
follows will add load to ant just when grip-cron itself is wanting to
use a lot of CPU and a lot of network connections.

Alternatively, set the cron task to run long enough before 3pm that
there is no chance of the mirroring still running when 3pm comes
around, but long enough after 3pm that grip-cron itself has finished.
(That time will, inevitably, land right in the zone where the normal
maintenance cron tasks run on ant (6am UTC), so liaise with Simon for
that timing.)

I don't think grip-cron itself should know about triggers until we know
how other machines using emdebian-grip-server want to handle such
issues. 

-- 


Neil Williams
=============
http://www.data-freedom.org/
http://www.linux.codehelp.co.uk/
http://e-mail.is-not-s.ms/

Attachment: pgps4dRy9TjgI.pgp
Description: PGP signature


Reply to: