On Fri, 18 Sep 2009 13:08:35 +0100
Wookey <wookey@wookware.org> wrote:
> > The Grip process is not staged (it would take up even
> > more temporary space), the .deb is put into reprepro as soon as it
> > is built, the temporary files are all deleted, then we go on to
> > build the next .deb for the same package of the next architecture
> > etc.
> 
> But we still only do the whole job once per day and could send a
> signal after it is done.
I need to work out how to solve the lag time first - right now, every
process is working on a lot more packages than I expected and it taken
consequently longer. Currently doing the stable-proposed-updates and
it's taken several hours.
However, yes, a single call for the trigger at the end rather than
trying to do stuff whilst gripping packages.
 
> > One issue right now is that we aren't handling removals *AT ALL*. I
> > haven't got code to handle what ftp.debian.org does manually (and
> > because ftpmasters do do this manually, there probably isn't code
> > to do it automatically). I don't think this is a particular problem
> > - removing a package from unstable and/or testing doesn't affect
> > the size of the archive pool/ (because we have a stable release
> > that retains it and we'll have an oldstable after that) and as the
> > package has been removed from Debian unstable and testing, we
> > aren't going to need to Grip the package again so it doesn't cost
> > us runtime either. However, I'm sure this will bite us eventually
> > and I'm not sure how to fix it other than to trust most of this to
> > reprepro so that when oldstable is removed prior to receiving
> > packages from stable during a release, the old versions will be
> > removed from pool/. 
> 
> Yes. Offhand I don't see any reason why a clearvanished won't tidy up
> for us. Perhaps there are corner cases?
clearvanished will do what it can - my concern is whether having out of
date listings in the Packages file would cause issues. e.g. if a package
has been removed, a different package does not need to have a Conflict:
or Replaces: entry. It's probably minor. Old packages won't show up as
"obsolete" in aptitude and synaptic until removed from oldstable -
could cause some irritation.
> > I'm certainly hoping that the combination of
> > other loads and the extra package workload due to the outage is the
> > reason why the times have grown so far. It does show that Grip has
> > technical limits to the number of packages per archive (i.e. per
> > machine). 
> 
> Well, clearly there is a limit, but I'm not sure we should be anywhere
> near it. repacking _should_ be a lot more efficient than actual
> building. We could make the process more efficient (or do it less
> often).
That's the question about Arch:any TDebs - it does take an appreciable
amount of time to build all the TDebs for each architecture. Each
translation forms 7 packages.
Repacking *is* a lot more efficient than actual building (native or
cross), but we are repacking for seven architectures, so we repack 7
times for Grip where we used to crossbuild once for Crush.
Packages like gcc are likely to be the main beneficiaries of faster
unpacking and repacking.
> > We don't need mirrors for Grip, we need partial builders that
> > augment the packages available from the base machine.
> 
> You've said that several times but I'm not sure I agree. It really
> comes down to bandwidth use. Is simon happy with current usage rates
> and trends. Would load-sharing of the downloads be a good idea? I
> suspect it would. If a lot of people are doing what I'm doing (running
> multistrsap several times a day), that soon adds up and either heavy
> users having local mirrors, or us setting up proper DNS-sharing for
> downloads makes a lot of sense IMHO.
OK, what I'm driving at is that Grip could do with some new and extra
packages. The current processes are taking so long it's hard to
actually debug things.
If additional resources can be brought online to both build some new
packages and mirror it'll be a lot more useful, IMHO, than just a
mirror.
> There are advantages and disadvantages to having a large subset of
> Debian in the base grip repo. Convenience is main advantage. Fat
> package file is main disadvantage.
> 
> I think we'd need to work out what splits we want in different repos,
> and whether the extra complexity of sources is balanced by having a
> smaller base. Some numbers on the current state would be helpful in
> order to discuss this further meaningfully (perhaps in a new thread).
OK, I'll collate some numbers - possibly this weekend.
Some information is available via the current logs:
http://www.emdebian.org/grip/logs/
(I need to purge some of those too. Probably drop all the ones from
the previous month.)
> > One option is to have a "behind-the-scenes" machine with the same
> > internet connection as the "frontend" but which does the grunt work
> > of processing the packages and then all the frontend (ant) needs to
> > do is sync the mirror twice a day. This would prevent things like
> > toolchains, apache and other tasks prolonging the build process.
> 
> There was always the plan that we would have separate
> buildd.emdebian.org and www.emdebian.org in due course. Hopefully
> we've een using the names properly in config so such a split would
> more-or-less 'just work' :-) I guess we are approaching that point.
> Perhaps those offering mirror space might like to offer more involved
> resources?
That would be my hope too.
The name split should be correctly handled by the current versions of
the scripts.
> > > I am not aware (yet) how emdebian-grip-server and grip-cron.sh
> > > works,
> > 
> > 3. Iterate through the list of packages, passing each .deb and .dsc
> > through the grip processing. Each run doesn't take that much time
> > but the .deb has to be unpacked and repacked - with very large
> > packages (java and gcc), this can take a noticeable period of time.
> > The problem is that as we are not compiling the package from
> > source, (where you only unpack the source once), we unpack each
> > compiled .deb for each architecture, process it and then repack it.
> > If the source builds 30 architecture-dependent binaries and Grip
> > includes 12 of those binary packages, we unpack and repack
> > 84 .debs. (7 architectures).
> 
> Using a ramfs for this process could have dramatic speed gains (and is
> probably very easy to do). 
OK, that is achievable - the perl scripts simply look for the TMPDIR
environment variable:
sub create_tmpdir {
	my $name = shift;
	my $pd = $ENV{'TMPDIR'} && -d $ENV{'TMPDIR'}
		? $ENV{'TMPDIR'}
		: '/tmp';
	return undef unless -d $pd;
	my $dir;
	eval { $dir = tempdir("$name.XXXXXXXX", DIR => $pd) };
	print("$@"), return undef if $@;
	return $dir;
}
The rest of the work goes into the incoming directory:
"/usr/bin/emgrip -o ${base}${grip_name}/incoming ".
I'll test this as soon as there's time.
  
> >  there are a few corner cases to do with translations and Arch:
> > all packages.... It is possible
> > that Emdebian decides *not* to bother about the endianness of .mo
> > files
> > - we need some data on whether the current setup does give any
> > performance gain 
> 
> I don't think it'll make any detectable difference, and making them
> arch:all is a very good idea. But yes, someone needs to check. 
http://lists.debian.org/debian-i18n/2009/01/msg00069.html
"Unless there is something wrong in this protocol (I did not checked
where the time is lost when the file cannot be mmaped directly, or if
the modeled usage is realistic), I would not try to save 0.02s every
time my browser gettextize 10000 strings. (The msgunfmt / msgfmt
conversion took 40 times more)."
Nicolas François
That was on fairly powerful hardware - if someone could have a look at
the protocol from the above page and test on hardware more suitable for
Emdebian, it would be very useful.
Loading X on low power / low memory hardware takes long enough - if the
gettext wrapper adds to that, it would be worth retaining
architecture-dependent TDebs for Grip.
> > > but the ./signal it is just a trigger.
> > > Anyone from emdebian server can talk to ftp.tw.debian.org and
> > > trigger the update by running such script in not much time, then
> > > the work is done by tw machine. :-)
> > 
> > I'm just expecting the trigger to be run less times per day. 
> 
> Yes. Once. ("fewer times per day" :-)
> 
> > I think a once-daily cron task is going to be necessary,
> > synchronised to run after the grip-cron task on ant (which
> > currently starts at 3pm UTC).
> 
> Exactly.
-- 
Neil Williams
=============
http://www.data-freedom.org/
http://www.linux.codehelp.co.uk/
http://e-mail.is-not-s.ms/
Attachment:
pgpOyNnniiWSU.pgp
Description: PGP signature