Re: .d.o machines which are down (Re: Questions for the DPL candidates)

To: debian-devel@lists.debian.org
Subject: Re: .d.o machines which are down (Re: Questions for the DPL candidates)
From: Ben Collins <bcollins@debian.org>
Date: Sun, 20 Mar 2005 08:28:09 -0500
Message-id: <[🔎] 20050320132809.GA25160@internal>
In-reply-to: <[🔎] 20050320083113.GC6265@mauritius.dodds.net>
References: <[🔎] 20050314233738.GA26922@pegasos> <[🔎] 87u0nd9ojf.fsf@frigate.technologeek.org> <[🔎] 20050315142813.GA13754@deprecation.cyrius.com> <[🔎] 200503160444.j2G4intX004544@renig.nat.blars.org> <[🔎] 20050316213851.GA12073@internal> <[🔎] 87u0nb15bc.fsf@becket.becket.net> <[🔎] 20050316233028.GA12399@internal> <[🔎] 874qfb10o4.fsf@becket.becket.net> <[🔎] 20050317003237.GB12681@internal> <[🔎] 20050320083113.GC6265@mauritius.dodds.net>

Why does everyone have a sudden interest in the sparc buildds? It has
always had one buildd until auric was no longer needed for ftp-master.
Things were fine back then, and still fine now. No one complained then,
why is everyone complaining now that I want to put a better single machine
in place?

Is it so bad that it is only one machine? Why does it need to be two
machines? Since when do we want the kind of redundancy that NASA requires
("what if a jet plane crashes into the building, and we lose our data")?

It's not like every port should have to find corporate sponsoring, for
bandwidth and equipment, but now you want them to have double the
sponsorship. Like companies are just giving away these things. Personally,
I don't care if an architecure's buildd is being run off the main
developers DSL line, on one box, so long as it keeps working. Don't
question it until a problem _does_ occur.

Move on folks. Things are not as bad as you'd like everyone to think.
Seems like people want all kinds of excuses to remove ports.

How about this, show me the i386 buildd's, and their specs.

To answer your questions:

1. CPU/mem board failure requires me to call visi.net and ask someone to
   remove the board and power the machine back on. Visi.net is a sparc
   shop, so this isn't beyond their capabilities.

2. The raid array in the box will be setup with LVM, and I'll have one
   spare drive for hotspare. Obviously this is all done software side, so
   only requires remote access.

These are all rather moot points anyway. We don't have that kind of
redundancy for our own list server, or ftp-master for that matter. I
remember when debian.org was virtually shutdown for over a week due to a
break in on all our equipment. Should Debian be allowed to distribute
Linux if it can't handle these kinds of things?

On Sun, Mar 20, 2005 at 12:31:14AM -0800, Steve Langasek wrote:
> Hi Ben,
> 
> On Wed, Mar 16, 2005 at 07:32:37PM -0500, Ben Collins wrote:
> > On Wed, Mar 16, 2005 at 06:11:39PM -0800, Thomas Bushnell BSG wrote:
> > > Ben Collins <bcollins@debian.org> writes:
> 
> > > > The requirement sucks, lets leave it at that. If the machine dies, I can
> > > > have two to replace it within a day or two.
> 
> > > > The point being, there's no reason to have two seperate machines when one
> > > > can do the job. As long as it keeps up, then there should be no cause for
> > > > concern.
> 
> > > If you have one machine, and it dies, and it takes you a day or two to
> > > replace it, then it cannot "do the job".  If you can guarantee that it
> > > never dies (somehow), then maybe it could.
> 
> > Ok, I can guarantee that it never dies. The hardrives are raid 5
> > configuration, and the power supplies are redundant, and if any of the
> > three cpu/mem boards goes bad, I can just remove it and let the other two
> > (4x cpu's and 4gigs ram) run. Then there's also two 10/100mbit ethernet
> > adapters.
> 
> > It wont die all together, it's an enterprise class system. It's meant to
> > keep going, even if it has to limp to do so. Even with 1 cpu/mem board, it
> > still would have 2 cpu's and 2gigs of ram.
> 
> Is this system going to have one or two RAID5 arrays?  Assuming that both
> buildds would be hosted on a single array, who monitors the RAID status to
> ensure timely replacement of hard drives in the event of a single drive
> failure?  What assurance do we have that a single drive failure would be
> regarded as a matter requiring immediate attention (local, assuming there's
> no hot-standby drive)?  What happens if, God forbid, the facility suffers an
> HVAC failure, resulting in the simultaneous failure of multiple drives in the
> RAID array?
> 
> Does a CPU board failure require on-site intervention?
> 
> What happens to the Sparc port if Visi.net decides they are no longer
> willing to sponsor hosting for these build daemons?
> 
> I certainly appreciate that having two buildds in a single enterprise-class
> chassis can offer as much redundancy as two buildds in adjacent chassis, but
> it can't provide geographic separation.  Finding two independent hosting
> sponsors really ought not be a problem for a viable port, and I don't think
> it's in the best interest of the port for either you to offer to guarantee
> the buildds will never die, or for others involved in the sparc port to
> accept such a guarantee.
> 
> I would be grudgingly willing to accept a pair of buildds in this
> configuration as meeting the requirement for release architectures if this
> is the wish of the sparc porting team, but I would strongly encourage
> getting some geographic separation in place for your own benefit so that we
> don't find ourselves forced to drop sparc as a release architecture as a
> result of one of the above-mentioned failure scenarios that you haven't
> mitigated.
> 
> -- 
> Steve Langasek
> postmodern programmer

-- 
Debian     - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
WatchGuard - http://www.watchguard.com/

Reply to:

Follow-Ups:
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Thomas Bushnell BSG <tb@becket.net>
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Anthony Towns <aj@azure.humbug.org.au>

References:
- Re: Questions for the DPL candidates
  - From: Sven Luther <sven.luther@wanadoo.fr>
- Re: Questions for the DPL candidates
  - From: Julien BLACHE <jblache@debian.org>
- .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Martin Michlmayr - Debian Project Leader <leader@debian.org>
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Blars Blarson <blarson@blars.org>
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Ben Collins <bcollins@debian.org>
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Thomas Bushnell BSG <tb@becket.net>
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Ben Collins <bcollins@debian.org>
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Thomas Bushnell BSG <tb@becket.net>
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Ben Collins <bcollins@debian.org>
- Re: .d.o machines which are down (Re: Questions for the DPL candidates)
  - From: Steve Langasek <vorlon@debian.org>

Prev by Date: Bug#300578: ITP: mozilla-firefox-locale-ar -- Mozilla Firefox Arabic Language/Region Package
Next by Date: Re: my thoughts on the Vancouver Prospectus
Previous by thread: Re: .d.o machines which are down (Re: Questions for the DPL candidates)
Next by thread: Re: .d.o machines which are down (Re: Questions for the DPL candidates)
Index(es):
- Date
- Thread