[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Debian needs more buildds. It has offers. They aren't being accepted.



Ingo Juergensmann <ij@2004.bluespice.org> writes:

> On Wed, Feb 11, 2004 at 08:59:49PM +0000, Martin Michlmayr - Debian Project Leader wrote:
> 
> > Maintaining a buildd system for 11 architectures is a fairly complex
> > task, especially with architectures that require many machines to keep
> > up since it's likely that some machines will break.  There have been
> 
> Having 10 machines can mean, 10 machines can fail. Having 1 machine, means
> only one machine can fail. Whereas the first seems to introduce higher
> possibility oof failures, it also means that there is a redundancy because
> of:
> 10 machines - one fail -> 10% of CPU time fails
> 1 machine - one fail -> 100% of CPU time fails
> 
> 10 machines tend to have more failures by number, but that don't result in
> bringing down the whole port. Well, you know that of course... nothing new
> here... ;)

Same goes for admins. 1 admin can be sick or busy working on
security. With 10 admins most certainly one will allways have some
spare time, not be sick, be in a good mood.

Its not just hardware thats problematic, its people and spare time
too.

> > various discussions about problems in the past, some have been fixed,
> > other problems came up.  Your mail basically says "fix this", without
> > actually saying what is wrong; just accusing various people based on
> > second hand knowledge.
> 
> Hrm, not exactly as I understood his original mail. 
> One part of his complain was that Ryan was some sort of unresponsive,
> although there were several postings on debian-mips requesting the building
> of qt-x11-free, but nothing happened. *This* is some sort of unacceptable.

The same happened for the qt version before that too. I guess the same
will happen for the next version again. There is still no indication
that Ryan has even look at the problem.

Martin please do ask Ryan to look into his no_auto_build list and
similar to see if his buildds are excluding qt-x11-free. Or if the
buildd did take qt-x11-free and gave it back for some unknown reason
over and over. And ask him to report his findings please.

> When people (i.e. normal maintainers/users) are forced to solve buildd
> problems by hand, we don't need any buildds at all (over-emphasized), but
> only a bunch of machines were DDs can login and build their packages
> themselves. Of course this is nonsense.
> So, the question was, how can that situation can be avoided in the future. 
> In prior discussions there was always mentioned that Ryan and James are
> quite busy with other tasks. Therefore the request to move this kind of
> "non-important" work (compared to securing d.o-machines) to other people. 
> 
> > Number of out of dates holding up testing:
> >     i386 (8), hppa (65), ia64 (66), sparc (83), powerpc (92), s390 (92),
> >     m68k (100), alpha (111), arm (161), mipsel (184), mips (200)
> 
> For m68k the number could have been lower, as you know... ;-)

And it was over 200 a few days ago, before akire was added, even with
the manual builds others and I have been doing since the compromise. I
still have to build packages manually and get them sponsored by Wouter
Verhelst (the buildd admin since I'm not a DD) although the buildd has
been all setup for weeks now. Doing this manualy takes several times
longer than the normal buildd mail system (which needs w-b access to
work) and that every day.
 
> > ARM
> > ===
> > [...] 
> > mipsel
> > ======
> > [...] 
> > mips
> > ====
> [...]
> > debian-mips).  I can only say that I suggested building XFree86 (as
> > per the request of the maintainer who wanted feedback for his
> > experimental package) on this machine which had been offered as a
> > buildd and was told that it didn't have enough disk space.  Whether
> 
> Oh, that's me.... 8-)
> Well, but I told you as well, that the machine had *currently* not that much
> disk space, but is being worked on that. *Most* packages don't require 3-5
> GB of disk space to build. 
> And shortely after I told you that there wouldn't be that much space, it was
> available and could have been used. I also setup an account for Branden, but
> he never used it. (I think the compromise came inbetween and he wasn't able
> to login to crest where I stored the login information for him. In the
> meanwhile, I'm excluded from login to crest, so I can't give away accounts
> away that easy anymore... *sigh* but the machines are still available for
> DDs that need access to some MIPS machines) 

As far as I understood the mips buildd story the buildd was refused on
grounds that were just poor excuses, unwillingness to cooperate or to
give up solem control of the buildds for the arch. Not very nice if
the project has to suffer because of that.

> > the machine would have been useful as a buildd is therefore
> > questionable, but I cannot tell for sure without knowing details of
> 
> Well, you surely know the german saying "Kleinvieh macht auch Mist".
> Therefore, at that time it would have been very useful. 
> 
> > the machine.  However, Ryan has mentioned before that he needs a fast
> > mips box, and not another slow one.  Which almost brings me to the
> > current situation.
> > Just to conclude what happened after the last flamewar.  The problem
> > with mips back then was that the fastest mips buildd (sgi.spamo.org)
> > had hardware problems.  The owner thought it was a disk problem, so I
> > told him Debian can buy new disks; he volunteered to get new ones
> > himself.  He did, but as it turns out it was a different problem which
> > he promptly fixed by using a component from his spare machine.  mips
> > was working again without any problems after this.
> 
> Hmmm, there was an offer to one of my subscribed Irix Mailing lists for some
> SGI machines to give away for free on a certain day in Oberhausen (Germany).
>  
> > (In the meantime, to make the problem worse, casals.d.o needs a new
> > kernel and cannot be used as buildd in the meantime.  This should
> > hopefully be fixed soon, though.)
> 
> Erm, why can't a machine be used as a buildd *and* for DDs to port/debug
> their packages? Crest does both, sometimes with 2-4 builds in parallel to
> the buildd. 

If only it where big and fast enough. :) Crest is likely to timeout
build due to its load. But hey, a mips system is faster by several
magnitudes so that shouldn't be a problem.

> > So, while mips has been problematic for a while due to unfortunate
> > circumstance, it's on the best way of getting fixed again.
> 
> m68k had bad luck shortly before mips last autumn. That was the reason why I
> offered the mips machines.
> 
> > In summary, the currently problematic architectures are being worked
> > on.

Even with working hardware there is still the bottleneck of having
just one person doing all the work, Ryan and James (depending on the
arch). That should be spread around. Why not allow a third and fourth
mips buildd mainatined by someone else? Even if they are slower it
still adds redundancy in hardware and people.

And slower machines make less work so the admin would have more time
to respond to requests to rebuild packages or to build packages in a
hurry out of order.

> But it seems as only Ryan, James and you are know of that. For all others it
> seems as nothing would happen. It would nice to have those information
> better communicated to other people. If someone give me some info, I can
> publicate it on www.buildd.net, but apparently I need some input for doing
> so. 
> 
> We (m68k port) know from experience that when a buildd is down, the
> maintainers of the stuck packages will come and ask sooner or later what's
> happened to their packages. 
> Therefore I setup a possibility to automatically display the status on
> www.buildd.net for each buildd. But that service is living from
> participating buildd admins. Currently only m68k (completely), ia64 and hppa
> are participating. 
> I plan for the future to automatically detect the status of the buildd
> (running, NO-DAEMON-PLEASE, ...) as well and not only the machine itself.
> In the meanwhile the buildd admin have to mail me a reason when a machine is
> down for a longer time. 

I love those pages. Can't thank you enough for them.

> > It should be noted, though, that most recent "cannot get
> > wanna-build access" and other discussions are not about these
> > currently problematic architectures, but about m68k - which, as you
> 
> Partly true. 
> m68k could have a smaller backlog when adding to w-b would be faster/easier.
> But adding buildds for other archs would be nice as well to work around
> current problems (like a broken machine/dead disk)... 

Ingo offered the mips buildd. I have a 400Mhz mipsel system here I
wanted to offer but didn't after Ryans reaction to the mips offer.

If the responce had been more welcome both mips and mipsel would have
had fallback buildds taking up the load during the problematic
times. I think the discussion is very well about mips/mipsel too.

And also arm, since its also an architecture only managed by James.
You said there are more arm systems coming, can we have someone else
as admin please, for redundancy purpose alone.

> > can see in the statistics above, is doing pretty well with the 10 (!)
> > buildds they already have.  Two more are currently being build by Ingo
> > Juergensmann, and they'll probably be added once they become
> > available.  As pointed out in Ingo Juergensmann's message in this
> > thread, some m68k systems are also needed for d-i work.  I suggest
> > that Goswin's two boxes which have not been added to wanna-build
> > should be used for d-i work (especially given that Goswin expressed
> > interested in doing d-i m68k work int he past), and the two machines

Will you sponsor the packages I need to create for the Amiga D-I
support? And you will have to trust me on them and sponsor binary-only
uploads since they are not autobuildable.

I also fear they aren't even DFSG compliant, for all I know the amiga
bootloader is still linked against public domain libraries.

And another point, what D-I work should be done on them? If I need to
test something I just stop the buildd, reboot and test, reboot and let
the buildd run again. Its not like D-I work takes 2 systems 24/7. Its
mostly waiting for bug fixes to filter through the buildds.

> > Ingo is working on will become part of the w-b infrastructure when
> > they're ready.

Actually 17 buildds, 13 running, 2 of those (mine) still run manually.

I've build ~170 packages successfull, ~40 failures, ~200 dep-wait and
~30 source fetch failures (no accepted/autobuild access) on those 2
buildds manually since they have been configured and waiting for
wanna-build access. Without that m68k would be around 300 packages
need-build now.

> Yeah, I expect the Debian-bought 060 card for tomorrow, btw. 
> But to decide what machines are used for d-i work and which as a buildd is
> quite difficult. Without Goswins help as a human-buildd we wouldn't have
> that small backlog currently. As soon as the new buildds are up and running,
> he could start d-i work again. But then again, it doesn't make much sense to
> setup new buildds when you have to wait some weeks until they can get w-b
> access. 

I think it would be good (if we have room to breath with the new
buildds) to stop the buildd on crest and make it exclusively a
developer machine. Or limit the packages that crest builds to the
smaller ones. Any D-I developer can log in there and build packages. A
spare partition could be freed up for "make demo" d-i runs too. But
thats off-topic, lets wait for the buildds to get w-b access and then
talk on irc.

> That is a problem that needs to get fixed - and that needs some sort of
> communication as well. I tried to contact James on IRC when I saw him being
> active, but got no response. Wouter asked him as well, iirc, so the whole
> problem with the wrong key would have been solvable very quick when he
> would have been responsive. *sigh*
>  
> > Anyway, I am in contact with Ryan, James and others to get arm, mips
> > and mipsel addressed asap.

Please concentrate on getting others to do the job so there is a
wider base and no single point of failure.

If you look at
http://www.buildd.net/buildd/Building_stats.png
you can get some feeling for the reaction time of the buildd admins on
each arch. The timespan is just the last few days but let me interpret
it a bit:

You see that m68k, even though it has 13 buildds, has very few
packages marked building. The large bulge around Feb 9 was a big chunk
of packages manually being set to "building" on my buildd. The more
sudden drops respond to Wouter manually signing my packages for
upload. Apart from that packages are signed for upload, failed or
dep-waited at any time during the day without much delays.

For other archs like s390, mips, mipsel and arm you see huge spikes
and drops that correspond to an admin signing packages, failing
packages or repairing buildd problems every once in a while. Its not
too bad. The admin probably only signs packages once a day so thats to
be expected for the faster buildds. Less people, less often things are
done.

What bothers me is the average height for each arch. Packages in state
building but not actually building anymore (which should be one
package per buildd) are not yet signed for upload, not yet failed, not
yet on dep-wait or need further investigation. Only packages needing
further investigation should be left in building over a longer time.
The average height could be interpreted as the number of packages the
admin doesn't have time (or will) to handle right now. Idealy every
time the admin looks after his buildd all packages but the currently
building one (actually 2-10 due to pre-caching) should be handled. The
"building" line should touch the 10 line with only 1 buildd, or at
least the 100 line (leaving 90 packages to be looked into before
deciding).

Lets look at all archs building height ranges:
low: 0-150,  medium: 50-200, heigh: 100-400

Arch    |#buildds|  admin  | height
--------+--------+---------+-------
Alpha   |  1     | -       | medium
Arm     |2(now 4)| James   | high
hppa    |  1     | LaMont  | low
i386    |  1     | Ryan    | high given the number of packages autobuild
ia64    |  1     | LaMont  | low
m68k    | 13     | group   | low
mips    |  1     | Ryan    | high
mipsel  |  1     | Ryan    | high
powerpc |  1     | Dan     | medium
s390    |  2     | Gerhard | high
sparc   |  1     | -       | low

Strange, only 3 names come up as high: James, Ryan and Gerhard.

And s390 had some problems lately, is looking for some more hardware,
has been responsive. They are communicating that they are working on
it so nobody worries. s390 has also a low count of sarge blockages.

And don't tell me its the hardware. A broken buildd results in sharp
rises (when it goes wild) and huge drops when the admin repairs it or
shuts it down (like s390 had). A broken hardware or buildd should not
leave packages dangling in state "building", any packages taken by a
broken buildd should be given back.


Too bad we don't have figures for the time between end of build and
admin action. That would be a real measurement of the responsiveness.

> I would wish that we can establish a way to interact with each other in a
> productive way for the sake of the project. Maybe a new mailing list for all
> buildd admins (and related persons such as buildd hosters) would be nice
> where announcements like the move of w-b to newraff are publicated in
> advance? 
> 
> -- 
> Ciao...              // 
>       Ingo         \X/

And hopefully a move of the m68k-buildd list so we get less spam. :)

MfG
        Goswin



Reply to: