[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Fwd: hard drive failure under RAID-1



The following is something to consider when setting up RAID arrays.  At the
moment AFAIK every RAID solution suffers from this problem.  :(


I have a Linux software RAID-1 array consisting of two IBM IDE hard drives.
The latest kernel works the same way as the 2.4.2 kernel I am using on that
machine.

I have just had them both fail at the same time!  They both had quite a
number of bad sectors, however there was no sector that was bad on both
disks!

The result I would have liked to see would be that when a bad sector is
encountered during a read from disk 0, then disk 1 should then be read.  If
the data can be read from disk 1 then it should be written back to disk 0.
If after that disk 0 can be read (the likely result using sector-sparing in
hardware) then it should give lots of huge kprintf() errors and keep running.

The result I saw was that disk 0 was marked as failed, then when a different
sector failed on disk 1 the ext2 file system saw errors, the system stopped
functioning correctly and needed a hard reset.  Then it paniced on boot
because it couldn't add either disk to the RAID-1.  Since then I have been
trying to recover it.  I wrote a program to read both disks and take data
from disk 1, but take it from disk 0 when disk 1 returned a bad sector.  But
this didn't work well because disk 1 had run for some time without disk 0.

In summary a situation which could have been salvaged by an emergency visit
to a computer store turned into a catastrophy.  :(

-- 
http://www.coker.com.au/bonnie++/     Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/       Postal SMTP/POP benchmark
http://www.coker.com.au/projects.html Projects I am working on
http://www.coker.com.au/~russell/     My home page



Reply to: