[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Why didn't software RAID detect a faulty drive?



Seth Mattinen wrote:
The system was horribly unresponsive; I never did try adding the drive
back in because it was a live server and I didn't want to risk it. I
would have expected any RAID to fault an unresponsive drive even if it
was a quirk. I just replaced it.
Two things I learned recently, the hard way, when I had a RAID drive fail:

1. Drives can fail in ways that can get masked for a long time, in particular - increasing numbers of disk reads or writes that eventually succeed - after lots of retries. The symptom is that things slow down to a crawl. Not sure why the md software doesn't simply fail drives that exhibit long delays, but it doesn't seem to (ideas anyone?).

2. If all of your drives are the same age - it would be a very good idea to replace the OTHER drives in your RAID array before they start failing. In my case, I had a server with four drives (2 RAID1 sets). As I was recovering from one drive failure, two of the others failed in rapid succession. Not very pretty at all.

Miles Fidelman

--
In theory, there is no difference between theory and practice.
In practice, there is.   .... Yogi Berra



Reply to: