[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Fwd: hard drive failure under RAID-1



You know, I had software RAID-1 implemented on some 2.2.x box about 2
years ago using two Western Digital drives until one day, both drives
failed at the same time.  Apparently, there was some hardware bug
whereby after ~99 days of uptime, the drives lost power and the
controller stopped working. 

I had put the two identical drives in after reading some RAID docs
saying that this was preferable for some kind of undetermined mythical
reasons related to "being consistent."  But in my case, the risk of a
hardware bug downing both drives turned out to be what got me.

Software raid is kind of appealing, but ... not sure I'll go down that
path again.  I'm using a 3ware controller now.

phil.

Russell Coker wrote:
> 
> The following is something to consider when setting up RAID arrays.  At the
> moment AFAIK every RAID solution suffers from this problem.  :(
> 
> I have a Linux software RAID-1 array consisting of two IBM IDE hard drives.
> The latest kernel works the same way as the 2.4.2 kernel I am using on that
> machine.
> 
> I have just had them both fail at the same time!  They both had quite a
> number of bad sectors, however there was no sector that was bad on both
> disks!
> 
> The result I would have liked to see would be that when a bad sector is
> encountered during a read from disk 0, then disk 1 should then be read.  If
> the data can be read from disk 1 then it should be written back to disk 0.
> If after that disk 0 can be read (the likely result using sector-sparing in
> hardware) then it should give lots of huge kprintf() errors and keep running.
> 
> The result I saw was that disk 0 was marked as failed, then when a different
> sector failed on disk 1 the ext2 file system saw errors, the system stopped
> functioning correctly and needed a hard reset.  Then it paniced on boot
> because it couldn't add either disk to the RAID-1.  Since then I have been
> trying to recover it.  I wrote a program to read both disks and take data
> from disk 1, but take it from disk 0 when disk 1 returned a bad sector.  But
> this didn't work well because disk 1 had run for some time without disk 0.
> 
> In summary a situation which could have been salvaged by an emergency visit
> to a computer store turned into a catastrophy.  :(
> 
> --
> http://www.coker.com.au/bonnie++/     Bonnie++ hard drive benchmark
> http://www.coker.com.au/postal/       Postal SMTP/POP benchmark
> http://www.coker.com.au/projects.html Projects I am working on
> http://www.coker.com.au/~russell/     My home page
> 
> --
> To UNSUBSCRIBE, email to debian-isp-request@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

-- 

                                Whirlycott
                                Philip Jacob
                                phil@whirlycott.com
                                http://www.whirlycott.com/phil/



Reply to: