[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Emdebian] Mirrors

+++ Neil Williams [2009-09-18 10:39 +0100]:
> On Thu, 17 Sep 2009 22:49:07 +0200
> Hector Oron <hector.oron@gmail.com> wrote:

> OK but the signal should probably not happen every 2 seconds for 7
> hours per day. 

No, it should happen once at the end of the 7 hours.   

> The Grip process is not staged (it would take up even
> more temporary space), the .deb is put into reprepro as soon as it is
> built, the temporary files are all deleted, then we go on to build
> the next .deb for the same package of the next architecture etc.

But we still only do the whole job once per day and could send a signal
after it is done.

> One issue right now is that we aren't handling removals *AT ALL*. I
> haven't got code to handle what ftp.debian.org does manually (and
> because ftpmasters do do this manually, there probably isn't code to do
> it automatically). I don't think this is a particular problem -
> removing a package from unstable and/or testing doesn't affect the size
> of the archive pool/ (because we have a stable release that retains it
> and we'll have an oldstable after that) and as the package has been
> removed from Debian unstable and testing, we aren't going to need to
> Grip the package again so it doesn't cost us runtime either. However,
> I'm sure this will bite us eventually and I'm not sure how to fix it
> other than to trust most of this to reprepro so that when oldstable is
> removed prior to receiving packages from stable during a release, the
> old versions will be removed from pool/. 

Yes. Offhand I don't see any reason why a clearvanished won't tidy up
for us. Perhaps there are corner cases?

> I'm certainly hoping that the combination of
> other loads and the extra package workload due to the outage is the
> reason why the times have grown so far. It does show that Grip has
> technical limits to the number of packages per archive (i.e. per
> machine). 

Well, clearly there is a limit, but I'm not sure we should be anywhere
near it. repacking _should_ be a lot more efficient than actual
building. We could make the process more efficient (or do it less

> We don't need mirrors for Grip, we need partial builders that
> augment the packages available from the base machine.

You've said that several times but I'm not sure I agree. It really
comes down to bandwidth use. Is simon happy with current usage rates
and trends. Would load-sharing of the downloads be a good idea? I
suspect it would. If a lot of people are doing what I'm doing (running
multistrsap several times a day), that soon adds up and either heavy
users having local mirrors, or us setting up proper DNS-sharing for
downloads makes a lot of sense IMHO.

It seems to me that mirroring never does any harm.

There are advantages and disadvantages to having a large subset of
Debian in the base grip repo. Convenience is main advantage. Fat
package file is main disadvantage.

I think we'd need to work out what splits we want in different repos,
and whether the extra complexity of sources is balanced by having a
smaller base. Some numbers on the current state would be helpful in
order to discuss this further meaningfully (perhaps in a new thread).

> One option is to have a "behind-the-scenes" machine with the same
> internet connection as the "frontend" but which does the grunt work of
> processing the packages and then all the frontend (ant) needs to do is
> sync the mirror twice a day. This would prevent things like toolchains,
> apache and other tasks prolonging the build process.

There was always the plan that we would have separate
buildd.emdebian.org and www.emdebian.org in due course. Hopefully
we've een using the names properly in config so such a split would
more-or-less 'just work' :-) I guess we are approaching that point.
Perhaps those offering mirror space might like to offer more involved

> > I am not aware (yet) how emdebian-grip-server and grip-cron.sh works,
> 3. Iterate through the list of packages, passing each .deb and .dsc
> through the grip processing. Each run doesn't take that much time but
> the .deb has to be unpacked and repacked - with very large packages
> (java and gcc), this can take a noticeable period of time. The problem
> is that as we are not compiling the package from source, (where you
> only unpack the source once), we unpack each compiled .deb for each
> architecture, process it and then repack it. If the source builds 30
> architecture-dependent binaries and Grip includes 12 of those binary
> packages, we unpack and repack 84 .debs. (7 architectures).

Using a ramfs for this process could have dramatic speed gains (and is
probably very easy to do). 
>  there are a few corner cases to do with translations and Arch:
> all packages.... It is possible
> that Emdebian decides *not* to bother about the endianness of .mo files
> - we need some data on whether the current setup does give any
> performance gain 

I don't think it'll make any detectable difference, and making them
arch:all is a very good idea. But yes, someone needs to check. 

> > but the ./signal it is just a trigger.
> > Anyone from emdebian server can talk to ftp.tw.debian.org and trigger
> > the update by running such script in not much time, then the work is
> > done by tw machine. :-)
> I'm just expecting the trigger to be run less times per day. 

Yes. Once. ("fewer times per day" :-)

> I think a once-daily cron task is going to be necessary, synchronised
> to run after the grip-cron task on ant (which currently starts at 3pm
> UTC).


Principal hats:  iEndian - Balloonboard - Toby Churchill - Emdebian

Reply to: