[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Hardware trouble ries.debian.org - ftpmaster.debian.org / release.d.o services back this weekend



On Sat, Apr 03, 2010 at 07:58:58AM +0200, Christian PERRIER wrote:
> Quoting Russ Allbery (rra@debian.org):
> > Laurent Léonard <laurent@open-minds.org> writes:
> > 
> > > ries.debian.org (solved by changing the mainboard ?), but I'm
> > > surprised there is no 6, 12 or at least 24-hours on-site support
> > > included with a 20 000 dollars server...
> > 
> > I believe there is, but the first few things tried to fix the server
> > didn't work.  Sadly, this is not that unusual.
> 
> On can add that, even if the server have a high level support, the
> people who are managing the servers (DSA, local admins) do the Debian
> work on their spare time (there are big chances that they have a paid
> work) and most are doing the work remotely.
> 
> That usually doesn't help to be very fast when hardware issues involve
> local actions.
> 
> Also,of course, as you pointed, the problems were not obvious to
> identify at the beginning.
> 
> I use this occasion to thank our DSA and FTPmaster teams for the hard
> work involved to solve out these issues. That has been great jobs,
> folks.

Christian: thanks.  The local admins were fantastic.  Without them,
DSA would not be able to do anything.

Laurent: To have a four-hour support arrangement, we need both hosting
provider and vendor agreements.

ries is located at Brown University.  Brown provides free hosting,
bandwidth and remote intelligent hands.  They have provided exemplary
support but it doesn't include (nor should we have an expectation that
it includes) 4-hour response.

ries is covered by a next business day post-warranty support agreement
with the vendor.

There were two contributing factors to time between problem occurrence
and problem resolution: (1) back and forth with the vendor in an attempt
to diagnose the true fault (initial diagnosis was faulty DIMMs;
subsequent diagnosis was faulty mainboard) and (2) once diagnosed,
shipping delays (parts ordered morning of 30th, but arrived on 2nd).

Turns out that both the mainboard and two DIMMs are faulty, which is a
strange pairing unless there was a lightning strike (apparently not).

One should also take in mind that things like this happen very seldom,
and that the cost-benefit of a four-hour support agreement with the
vendor may not be warranted... especially considering that the hosting
provider may not be available to provide access (if we want this, we
should find a commercial provider with 24/7 staffing).  Even then, there
needs to be a diagnosis of the problem - which will also take time.  In
this case, ries ran for a few hours after we removed a few / moved
around the remaining DIMMs - a vexing false positive.

I hope that this information proves helpful.

On behalf of the DSA team,

Luca and Martin

-- 
Luca Filipozzi

Attachment: signature.asc
Description: Digital signature


Reply to: