Re: .d.o machines which are down (Re: Questions for the DPL candidates)

To: debian-devel@lists.debian.org
Cc: Ben Collins <bcollins@debian.org>
Subject: Re: .d.o machines which are down (Re: Questions for the DPL candidates)
From: Steve Langasek <vorlon@debian.org>
Date: Sun, 20 Mar 2005 00:31:14 -0800
Message-id: <[🔎] 20050320083113.GC6265@mauritius.dodds.net>
Mail-followup-to: debian-devel@lists.debian.org, Ben Collins <bcollins@debian.org>
In-reply-to: <[🔎] 20050317003237.GB12681@internal>
References: <[🔎] 423611AE.7030709@azure.humbug.org.au> <[🔎] 20050314233738.GA26922@pegasos> <[🔎] 87u0nd9ojf.fsf@frigate.technologeek.org> <[🔎] 20050315142813.GA13754@deprecation.cyrius.com> <[🔎] 200503160444.j2G4intX004544@renig.nat.blars.org> <[🔎] 20050316213851.GA12073@internal> <[🔎] 87u0nb15bc.fsf@becket.becket.net> <[🔎] 20050316233028.GA12399@internal> <[🔎] 874qfb10o4.fsf@becket.becket.net> <[🔎] 20050317003237.GB12681@internal>

Hi Ben,

On Wed, Mar 16, 2005 at 07:32:37PM -0500, Ben Collins wrote:
> On Wed, Mar 16, 2005 at 06:11:39PM -0800, Thomas Bushnell BSG wrote:
> > Ben Collins <bcollins@debian.org> writes:

> > > The requirement sucks, lets leave it at that. If the machine dies, I can
> > > have two to replace it within a day or two.

> > > The point being, there's no reason to have two seperate machines when one
> > > can do the job. As long as it keeps up, then there should be no cause for
> > > concern.

> > If you have one machine, and it dies, and it takes you a day or two to
> > replace it, then it cannot "do the job".  If you can guarantee that it
> > never dies (somehow), then maybe it could.

> Ok, I can guarantee that it never dies. The hardrives are raid 5
> configuration, and the power supplies are redundant, and if any of the
> three cpu/mem boards goes bad, I can just remove it and let the other two
> (4x cpu's and 4gigs ram) run. Then there's also two 10/100mbit ethernet
> adapters.

> It wont die all together, it's an enterprise class system. It's meant to
> keep going, even if it has to limp to do so. Even with 1 cpu/mem board, it
> still would have 2 cpu's and 2gigs of ram.

Is this system going to have one or two RAID5 arrays?  Assuming that both
buildds would be hosted on a single array, who monitors the RAID status to
ensure timely replacement of hard drives in the event of a single drive
failure?  What assurance do we have that a single drive failure would be
regarded as a matter requiring immediate attention (local, assuming there's
no hot-standby drive)?  What happens if, God forbid, the facility suffers an
HVAC failure, resulting in the simultaneous failure of multiple drives in the
RAID array?

Does a CPU board failure require on-site intervention?

What happens to the Sparc port if Visi.net decides they are no longer
willing to sponsor hosting for these build daemons?

I certainly appreciate that having two buildds in a single enterprise-class
chassis can offer as much redundancy as two buildds in adjacent chassis, but
it can't provide geographic separation.  Finding two independent hosting
sponsors really ought not be a problem for a viable port, and I don't think
it's in the best interest of the port for either you to offer to guarantee
the buildds will never die, or for others involved in the sparc port to
accept such a guarantee.

I would be grudgingly willing to accept a pair of buildds in this
configuration as meeting the requirement for release architectures if this
is the wish of the sparc porting team, but I would strongly encourage
getting some geographic separation in place for your own benefit so that we
don't find ourselves forced to drop sparc as a release architecture as a
result of one of the above-mentioned failure scenarios that you haven't
mitigated.

-- 
Steve Langasek
postmodern programmer

Attachment: signature.asc
Description: Digital signature

Reply to:

Follow-Ups:
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Ben Collins <bcollins@debian.org>

References:
- Re: Questions for the DPL candidates
  - From: Anthony Towns <aj@azure.humbug.org.au>
- Re: Questions for the DPL candidates
  - From: Sven Luther <sven.luther@wanadoo.fr>
- Re: Questions for the DPL candidates
  - From: Julien BLACHE <jblache@debian.org>
- .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Martin Michlmayr - Debian Project Leader <leader@debian.org>
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Blars Blarson <blarson@blars.org>
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Ben Collins <bcollins@debian.org>
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Thomas Bushnell BSG <tb@becket.net>
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Ben Collins <bcollins@debian.org>
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Thomas Bushnell BSG <tb@becket.net>
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Ben Collins <bcollins@debian.org>

Prev by Date: Re: Accepted valknut 0.3.7-1 (i386 source)
Next by Date: Re: Accepted valknut 0.3.7-1 (i386 source)
Previous by thread: Re: .d.o machines which are down (Re: Questions for the DPL candidates)
Next by thread: Re: .d.o machines which are down (Re: Questions for the DPL candidates)
Index(es):
- Date
- Thread