Re: wanna-build / how to sort packages on buildds?
Ingo JÃ¼rgensmann <email@example.com> writes:
> On Sun, 1 May 2011 01:36:38 +0200, Andreas Barth wrote:
>> Sometimes we have a few packages we don't want to build on a certain
>> buildds. Sometimes this is because this package needs lots of ram. Or
>> it takes quite long and would waste the parallel building a machine
>> supports. Or whatever else. Of course a package could be in more than
>> one category.
> Yes, you're facing basically the same problem I tried to address in
> 2000/2001 when doing my renderserver and later for what Multibuild was
> intended to do as well. ;-)
>> Now, what I would like to do is to write that down in a central file
>> with categories.
> I would recommend to use a database, really.
>> That is, to mark packages as "builds only with more than one gigabyte
>> of ram". And to mark buildds as "has 6 cores", "only ... ram" - so
>> that I don't need to copy entries from buildd to buildd, but just say
>> "that new machine is the same class as ...", and that's it.
> Another category would be "fast disk/raid". There are some packages
> with lots of disk accesses. When you can schedule those packages to a
> buildd that has faster disk access like in having multiple spindles
> for faster seeks, you can minimize build times as well. We faced that
> problem on m68k particularly on IDE vs SCSI disks on Amigas, as IDE
> was dog slow. Another example there was the faster disks on Amigas vs
> slower SCSI disks in Apple machines.
>> Now my question is just: How to do that efficient? I.e. how would
>> a configuration file look like, and how the code to distribute the
>> package on the most fitting buildd(s)? (I.e. it's better to waste 5
>> out of 6 cores than to not build a package at all, but a package
>> needing at least 1g ram can't build on a buildd with only 512mb - but
>> no package should starve in the end.)
>> Ideas? Suggestions? Code?
> Look at my update-buildd.net from Buildd.net, which I used to collect
> data from the buildds such as RAM, kernel, uptime, used swap and such
> (http://buildd.net/cgi/hostpackages.cgi?unstable_arch=m68k&searchtype=arrakis). I
> store this information into the database and also the build times of
> the packages. With this dataset it should be possible to have the
> wanna-buildd schedule packages in such a way to minimize the build
> times because you can decide which buildd is the most suitable buildd
> for the next package.
I think different groups of factors have to be considered:
1) absolute requirements
I think there are only 2 absolute requirements:
- ram size
- disk size
And all buildds currently have enough disk space I think.
In the past we also had some sources that would crash one buildd but not
the other. No way to track that ahead of time though. But it should be
possible to report this to wanna-build.
Absolute requirement are absolute. If a buildd doesn't have the
requirement then wanna-build must never schedule the package to build
there. (Note: The buildd will just give it back with the current setup
so no biggy if wanna-build gets it wrong.)
2) important features
The most relevant feature I think is multiple cores and support of
DEB_BUILD_OPTIONS=parallel=x. This would be an attribute of both the
buildd and the source and one should try to match them. Build sources
which support parallel building preverably on systems with multiple
The I/O speed and the sources need for it could be another such
feature. But I'm not sure (other than the m68k special case) this is
relevant to such a degree that it makes sense tracking this
Important features would be anything we can figure out and point to as
having a major influence on the build speed. And imho this should be
like "N times faster" to warrant the effort to track this for sources.
3) general performance
Buildds are different and build times will differ acordingly. I don't
think this can be properly quanitfied ahead of time and there are many
hidden factors interacting that would be impossible to quantify with
reasonable effort. But I think this can be measured and extrapolated
just fine. Keep a database of build times and do some statistical
analysis to rate the buildd speed in general and for specific
sources. With that you have a good aproximation of the time a source
will need to build on each buildd. Use that as weight when deciding
where to build a source.
Unlike items in 2), which would have to be manually tracked, this would
encompass any and all factors including unknown ones in
approximation. Some care would have to be taken that factors aren't
weighted twice, once from 2) and once here.
The build times for a parallel building source will differ greatly for
single and multi core systems. The difference in weigth this produces
might already be sufficient so that those sources prefer the multi core
systems (after a few versions). So tracking important features manually
might be wasted effort altogether.
My suggestion would be to implement something for 1) and 3) and see how
that goes. Actually 1) could be implemented by setting the build time to
infinity. So the scheduler only has to consider 3). If you then find
that sources have widely different build times on different buildds and
the weigth isn't enough to schedule right only then try to figure out
what makes the difference there.