Mdadm won't rebuild a RAID5
I have a RAID5 on 3 drives with a spare. One drive failed and it
rebuilt itself using the spare, then, before I could replace the spare,
a 2nd drive failed. I shut it down, got some new drives (bigger to be
sure they weren't too small, allowing for differences in drive sizes
reported by drive makers), replaced the bad drives, and rebuilt the
spare with no problem at all. Last night there were thunderstorms all
night and the computer lost power a few times (yes, normally it's
plugged in to a UPS, that's a long story that doesn't effect anything
here).
The RAID is on /dev/hde, f, g, and h, with h as the spare. On reboot,
the drive was not being reassembled. I tried this:
mdadm --assemble --verbose /dev/md0 /dev/hde /dev/hdf /dev/hdg
and I got this:
mdadm: looking for devices for /dev/md0
mdadm: /dev/hde is identified as a member of /dev/md0, slot 0.
mdadm: /dev/hdf is identified as a member of /dev/md0, slot 1.
mdadm: /dev/hdg is identified as a member of /dev/md0, slot 2.
mdadm: added /dev/hde to /dev/md0 as 0
mdadm: added /dev/hdf to /dev/md0 as 1
mdadm: added /dev/hdg to /dev/md0 as 2
mdadm: /dev/md0 assembled from 1 drive - not enough to start the array.
I stopped the array and reran the above command with "--run" added to
it. Then I ran:
mdadm --detail /dev/md0
and got this:
/dev/md0:
Version : 00.90.01
Creation Time : Sat Feb 25 07:10:01 2006
Raid Level : raid5
Device Size : 244198464 (232.89 GiB 250.06 GB)
Raid Devices : 3
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Mon Aug 13 06:00:34 2007
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 0dcc0b91:f92304ba:e66cf827:43274a37
Events : 0.3700484
Number Major Minor RaidDevice State
0 0 0 - removed
1 0 0 - removed
2 34 0 2 active sync /dev/hdg
I've tried this at different times and gotten different drives. I've
also dried:
mdadm --detail /dev/hde
and I get:
/dev/hde:
Magic : a92b4efc
Version : 00.90.00
UUID : 0dcc0b91:f92304ba:e66cf827:43274a37
Creation Time : Sat Feb 25 07:10:01 2006
Raid Level : raid5
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Update Time : Wed Aug 8 05:04:55 2007
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Checksum : 732dc269 - correct
Events : 0.3547873
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 33 0 0 active sync /dev/hde
0 0 33 0 0 active sync /dev/hde
1 1 33 64 1 active sync /dev/hdf
2 2 34 0 2 active sync /dev/hdg
The major differences come up when I run the same command to examine
hdf, g, and h. With f and g, on the last part, I get this with hdf:
Number Major Minor RaidDevice State
this 1 33 64 1 active sync /dev/hdf
0 0 0 0 0 removed
1 1 33 64 1 active sync /dev/hdf
2 2 34 0 2 active sync /dev/hdg
Then with hdg, I get:
Number Major Minor RaidDevice State
this 2 34 0 2 active sync /dev/hdg
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 34 0 2 active sync /dev/hdg
and on hdh, I get:
Number Major Minor RaidDevice State
this 3 34 64 3 spare /dev/hdh
0 0 33 0 0 active sync /dev/hde
1 1 33 64 1 active sync /dev/hdf
2 2 34 0 2 active sync /dev/hdg
3 3 34 64 3 spare /dev/hdh
I notice the information changes from drive to drive and is
inconsistent.
Mdadm is not telling me which drives are bad when it assembles the array
and I want to verify what is going on. I'd like to get more info to
see if mdadm "officially" sees drives e and f as bad or just drive e,
or none at all (since reports vary according to the drive).
I would have thought, after the first issue with any drive, that the
system would not have had the others in use, since, when booting, it
would have waited for me to hit "Control-D to continue," so I doubt
that there are actually two bad drives.
Any ideas how I can get more information, find out why mdadm is not
rebuilding the RAID or to get it to rebuild it? It seems to think the
drives are all okay when it's adding them and doesn't report any issues
with any drives until it's done, then it says there aren't enough
drives to start the RAID.
Hal
Reply to: