[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: buildd reliability



On 2023-03-26 12:25 +0200, Aurelien Jarno wrote:

> The 3 arm64 boards running at ARM are pretty fine, we do not have any
> issues with them, however they start to be old.
> 
> On the other hand we have many issues with the Ampere servers hosted at
> UBC and the Applied Micro servers hosted at Conova. All of them crash
> regularly (a few times per week in total) and need a powercycle. In
> addition the bullseye kernel does not work on Applied Micro servers, so
> we are currently stuck with buster on them :(.

OK. That's not good. Can you say which hardware those machines are?
Our buildd database does not say what actual kit is in use (just the
manufacturer), and I don't have rights to read the detailed buildd
admin info on the UBC and conova sites.

[ Aside: what would it take to put an extra field into our machine
database to specify what hardware each machine was? It can sometimes
be tricky to separate Model/motherboard/CPU as the required bit of
info but it would be really useful to write something more detailed
down both for issues like this and debugging. ]

My guess is that all the Conova machine are Mustangs, and the UBC machines 
are emags? Is that right?

Some enquiries tell me that both these machines types are reliable
(although the mustangs are slow) at OBS and Yocto, so they can be OK,
but there is certainly much faster kit available now (Ampere Altra).

Is there a bug about the boot failure on the Applied Micro machines? I
just failed to find one. If we know what hardware it is we can
investigate, because that does seem like something that should be
fixed.

> > I'm sure we can get new arm64 buildds if we need them.
> 
> Yes please. It's becoming urgent to get new ARM64 hardware to overcome
> all those issues, and we (DSA) failed to find new hardware to buy at a
> decent price.

OK. I'll see what can be done. I see Altra servers are from
$7000-$53000 on https://store.avantek.co.uk/arm-servers.html.

What does DSA consider 'decent'? I guess we'd prefer the resilience of
a couple of reasonable machines over one ridiculously manly one. A bit
of configury on the Aventek site suggests that basic ARM Altra servers
cost about twice as much as AMD ones for similar specs
(cores/RAM/disk), but then the power consumption is less than half. I
don't know how the performance actually compares for buildd purposes
(nor what sort of spec we prefer in terms of
nodes/cores/RAM/Disk/networkIF), but people describe the Altra's as
'fast'. I'll try and collect some more details to quantify that.

Does Debian run to a policy on packages/Wh for buildds yet, I wonder
(efficient hardware lowers emissions, for a given workload)? It's
worth paying something for more power-efficient kit, possibly quite a
lot for hardware like this that will run hard for years.

Are we running debian CI on this hardware or is that all done in the
cloud?

Wookey
-- 
Principal hats:  Debian, Wookware, ARM
http://wookware.org/

Attachment: signature.asc
Description: PGP signature


Reply to: