Re: Why didn't software RAID detect a faulty drive?

To: debian-isp@lists.debian.org
Subject: Re: Why didn't software RAID detect a faulty drive?
From: Seth Mattinen <sethm@rollernet.us>
Date: Sun, 19 Jul 2009 11:52:03 -0700
Message-id: <[🔎] 4A636B53.4040806@rollernet.us>
In-reply-to: <[🔎] 4A6367B3.1060101@meetinghouse.net>
References: <[🔎] 20090719073726.GA27473@lapse.rw.madduck.net> <[🔎] 4A63613B.8080204@rollernet.us> <[🔎] 4A6367B3.1060101@meetinghouse.net>

Miles Fidelman wrote:
> Seth Mattinen wrote:
>> The system was horribly unresponsive; I never did try adding the drive
>> back in because it was a live server and I didn't want to risk it. I
>> would have expected any RAID to fault an unresponsive drive even if it
>> was a quirk. I just replaced it.
>>   
> Two things I learned recently, the hard way, when I had a RAID drive fail:
> 
> 1. Drives can fail in ways that can get masked for a long time, in
> particular - increasing numbers of disk reads or writes that eventually
> succeed - after lots of retries.  The symptom is that things slow down
> to a crawl.  Not sure why the md software doesn't simply fail drives
> that exhibit long delays, but it doesn't seem to (ideas anyone?).

My guess is that the kernel was masking it. A hardware array controller
will see it directly since it's not relying on intermediate layers and
kick it out of the array. There's absolutely no reason to keep a slow to
respond drive in an array even if it's not throwing errors. This is one
situation where a hardware array has a distinct advantage.


> 2. If all of your drives are the same age - it would be a very good idea
> to replace the OTHER drives in your RAID array before they start
> failing.  In my case, I had a server with four drives (2 RAID1 sets).  
> As I was recovering from one drive failure, two of the others failed in
> rapid succession.  Not very pretty at all.
> 

I had two different brands in the array. ;) One Seagate, one Western
Digital. The WD (recertified, bleh) was the culprit.

~Seth

Reply to:

Follow-Ups:
- Re: Why didn't software RAID detect a faulty drive?
  - From: martin f krafft <madduck@debian.org>

References:
- Re: Why didn't software RAID detect a faulty drive?
  - From: martin f krafft <madduck@debian.org>
- Re: Why didn't software RAID detect a faulty drive?
  - From: Seth Mattinen <sethm@rollernet.us>
- Re: Why didn't software RAID detect a faulty drive?
  - From: Miles Fidelman <mfidelman@meetinghouse.net>

Prev by Date: Re: Why didn't software RAID detect a faulty drive?
Next by Date: Re: ISPmail Lenny tutorial ready
Previous by thread: Re: Why didn't software RAID detect a faulty drive?
Next by thread: Re: Why didn't software RAID detect a faulty drive?
Index(es):
- Date
- Thread