Mdadm won't rebuild a RAID5

To: debian-user@lists.debian.org
Subject: Mdadm won't rebuild a RAID5
From: Hal Vaughan <hal@thresholddigital.com>
Date: Sat, 18 Aug 2007 04:51:19 -0400
Message-id: <[🔎] 200708180451.19234.hal@thresholddigital.com>
Reply-to: hal@thresholddigital.com

I have a RAID5 on 3 drives with a spare.  One drive failed and it 
rebuilt itself using the spare, then, before I could replace the spare, 
a 2nd drive failed.  I shut it down, got some new drives (bigger to be 
sure they weren't too small, allowing for differences in drive sizes 
reported by drive makers), replaced the bad drives, and rebuilt the 
spare with no problem at all.  Last night there were thunderstorms all 
night and the computer lost power a few times (yes, normally it's 
plugged in to a UPS, that's a long story that doesn't effect anything 
here).

The RAID is on /dev/hde, f, g, and h, with h as the spare.  On reboot, 
the drive was not being reassembled.  I tried this:

mdadm --assemble --verbose /dev/md0 /dev/hde /dev/hdf /dev/hdg

and I got this:

mdadm: looking for devices for /dev/md0
mdadm: /dev/hde is identified as a member of /dev/md0, slot 0.
mdadm: /dev/hdf is identified as a member of /dev/md0, slot 1.
mdadm: /dev/hdg is identified as a member of /dev/md0, slot 2.
mdadm: added /dev/hde to /dev/md0 as 0
mdadm: added /dev/hdf to /dev/md0 as 1
mdadm: added /dev/hdg to /dev/md0 as 2
mdadm: /dev/md0 assembled from 1 drive - not enough to start the array.

I stopped the array and reran the above command with "--run" added to 
it.  Then I ran:

mdadm --detail /dev/md0

and got this:

/dev/md0:
        Version : 00.90.01
  Creation Time : Sat Feb 25 07:10:01 2006
     Raid Level : raid5
    Device Size : 244198464 (232.89 GiB 250.06 GB)
   Raid Devices : 3
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Aug 13 06:00:34 2007
          State : active, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 0dcc0b91:f92304ba:e66cf827:43274a37
         Events : 0.3700484

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1       0        0        -      removed
       2      34        0        2      active sync   /dev/hdg

I've tried this at different times and gotten different drives.  I've 
also dried:

mdadm --detail /dev/hde

and I get:

/dev/hde:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 0dcc0b91:f92304ba:e66cf827:43274a37
  Creation Time : Sat Feb 25 07:10:01 2006
     Raid Level : raid5
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0

    Update Time : Wed Aug  8 05:04:55 2007
          State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 732dc269 - correct
         Events : 0.3547873

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0      33        0        0      active sync   /dev/hde

   0     0      33        0        0      active sync   /dev/hde
   1     1      33       64        1      active sync   /dev/hdf
   2     2      34        0        2      active sync   /dev/hdg

The major differences come up when I run the same command to examine 
hdf, g, and h.  With f and g, on the last part, I get this with hdf:

      Number   Major   Minor   RaidDevice State
this     1      33       64        1      active sync   /dev/hdf

   0     0       0        0        0      removed
   1     1      33       64        1      active sync   /dev/hdf
   2     2      34        0        2      active sync   /dev/hdg

Then with hdg, I get:

      Number   Major   Minor   RaidDevice State
this     2      34        0        2      active sync   /dev/hdg

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2      34        0        2      active sync   /dev/hdg

and on hdh, I get:

      Number   Major   Minor   RaidDevice State
this     3      34       64        3      spare   /dev/hdh

   0     0      33        0        0      active sync   /dev/hde
   1     1      33       64        1      active sync   /dev/hdf
   2     2      34        0        2      active sync   /dev/hdg
   3     3      34       64        3      spare   /dev/hdh

I notice the information changes from drive to drive and is 
inconsistent.

Mdadm is not telling me which drives are bad when it assembles the array 
and I want to verify what is going on.  I'd like to get more info to 
see if mdadm "officially" sees drives e and f as bad or just drive e, 
or none at all (since reports vary according to the drive).

I would have thought, after the first issue with any drive, that the 
system would not have had the others in use, since, when booting, it 
would have waited for me to hit "Control-D to continue," so I doubt 
that there are actually two bad drives.

Any ideas how I can get more information, find out why mdadm is not 
rebuilding the RAID or to get it to rebuild it?  It seems to think the 
drives are all okay when it's adding them and doesn't report any issues 
with any drives until it's done, then it says there aren't enough 
drives to start the RAID.

Hal

Reply to:

Follow-Ups:
- Re: Mdadm won't rebuild a RAID5
  - From: Mike Bird <mgb-debian@yosemite.net>

Prev by Date: Re: GTK audio CD burner?
Next by Date: Re: source code installation
Previous by thread: Re: GTK audio CD burner?
Next by thread: Re: Mdadm won't rebuild a RAID5
Index(es):
- Date
- Thread