[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: unexpected NMUs || buildd queue



Wouter Verhelst <wouter@grep.be> writes:

> On Sat, Jul 17, 2004 at 06:51:19PM +0200, Thiemo Seufer wrote:
>> Wouter Verhelst wrote:
>> > On Sat, Jul 17, 2004 at 10:58:57AM +0200, Goswin von Brederlow wrote:
>> > > Wouter Verhelst <wouter@grep.be> writes:
>> > > > In any case, buildd doesn't write to disk what it's doing (the
>> > > > build-progress file is written by sbuild), so if it's aborted
>> > > > incorrectly (i.e., it doesn't have time to write a REDO file), that
>> > > > information goes lost.
>> > > >
>> > > > That's probably a bug, but once you know about it, it's easy to work
>> > > > around (it just means you have to clean up after a crash, but you have
>> > > > to do that anyway, so...)
>> > > 
>> > > Which is one of the things realy screwed up on the buildd/sbuild
>> > > combination.
>> > 
>> > What's your alternative?
>> 
>> Obviously to clean the chroot automatically unless a clean-buildd-shutdown
>> flag was written. Shouldn't be hard to implement.
>
> I'd like to see that implemented properly. There are a number of issues
> I can think of offhand that would make it hard:
> * the system crash might the result of the build itself. Runaway
>   memory-eating loop, causing the swapper to trash so horribly that a
>   power cycle is the fastest way to get it up and running again, for
>   example. Yes, those happen. In such a case, you don't want to wipe out
>   the chroot; you want to check out what went wrong, and you might need
>   whatever's in the chroot to find out.

With the lvm snapshot option or the untaring a tar.gz option the build
chroot for the package can be preserved and I plan to do
that. Basically the same option you have with sbuild (the keep always,
failed, never)

> * unstable is a moving (and breaking) target. Doing a debootstrap --
>   especially when tried noninteractively -- doesn't always work. And
>   yes, we need the chroot to be unstable. Think about it.

You could cdebootstrap (c* because it adapts to chages) stable or
testing and upgrade.

But you are right. That is the reason for the two levels of creating a
chroot (as descibed in my other mail). Running cdebootstrap should not
be the usual way and failures in it should result in the buildd
shutting down.

The design ideas I have collected also include having some sample
packages being build to test correct buildd operations. After a chroot
failure and recreation those test could be run to ensure correct
operations before resuming taking packages.

> There are probably more things I could come up with, but I didn't try
> hard. Wiping out and recreating the buildd chroot isn't an option.
> Neither is creating a new one alongside the original, unless the disk
> space requirements are a non-issue (which isn't true for some of our
> archs).

Even for space starved archs a 50Mb chroot.tar.gz to bootstrap/clone a
new chroot should not be a problem. Keeping past build failures on the
other hand will be. But worst case would be configuring those archs to
always halt the build on a catastrophic failure or crash. Thats about
as usefull as current operations.

> The only other option I could think of is to implement an AI that would
> investigate the chroot and remove any anomalies before restarting the
> next build, obviously all the while creating a perfectly detailed log
> (as in, exactly the amount of details you'll need to learn about what
> went wrong; nothing more and nothing less). That'd be nice to have, I'd
> say... ;-)

Or taring up the chroot and sending it to another machine in the local
network, say the big fileserver in the corner with GB of free space.

> Really, such cleanups can't be properly automated IMO. I agree that
> there are cases where buildd could be improved, but that doesn't mean
> manual cleanups can be avoided; and after a system crash, if a cleanup
> is required, it must be done manually.

The main improvement is detecting and handling various kind of build
failures. The buildd can properly seperate between failures to install
packages, compile failures and failure purging packages. Some AI
routines to, for example, analyse and test install failures will be
added (like runnig 'apt-get -f install' to see if there is dirt left).

MfG
        Goswin



Reply to: