Re: Paranoia about DegradedArray

To: debian-user@lists.debian.org
Subject: Re: Paranoia about DegradedArray
From: Hendrik Boom <hendrik@topoi.pooq.com>
Date: Wed, 29 Oct 2008 17:39:43 +0000 (UTC)
Message-id: <[🔎] gea74u$ueu$2@ger.gmane.org>
References: <[🔎] gea42m$ueu$1@ger.gmane.org> <[🔎] 200810291300.25447.hal@thresholddigital.com>

On Wed, 29 Oct 2008 13:00:25 -0400, Hal Vaughan wrote:

> On Wednesday 29 October 2008, Hendrik Boom wrote:
>> I got the message (via email)
>>
>> This is an automatically generated mail message from mdadm running on
>> april
>>
>> A DegradedArray event had been detected on md device /dev/md0.
>>
>> Faithfully yours, etc.
>>
>> P.S. The /proc/mdstat file currently contains the following:
>>
>> Personalities : [raid1]
>> md0 : active raid1 hda3[0]
>>       242219968 blocks [2/1] [U_]
>>
>> unused devices: <none>
>>
>>
> You don't mention that you've checked the array with mdadm --detail
> /dev/md0.  Try that and it will give you some good information.

april:/farhome/hendrik# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Sun Feb 19 10:53:01 2006
     Raid Level : raid1
     Array Size : 242219968 (231.00 GiB 248.03 GB)
    Device Size : 242219968 (231.00 GiB 248.03 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Wed Oct 29 13:23:15 2008
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 4dc189ba:e7a12d38:e6262cdf:db1beda2
         Events : 0.5130704

    Number   Major   Minor   RaidDevice State
       0       3        3        0      active sync   /dev/hda3
       1       0        0        1      removed
april:/farhome/hendrik# 



So from this do I conclude that /dev/hda3 is still working, but that it's 
the other drive (which isn't identified) that has trouble?

I'm a bit surprised that none of the messages identifies the other 
drive, /dev/hdc3.  Is this normal?  Is that information available 
somewhere besides the sysadmin's memory?

> 
> I've never used /proc/mdstat because the --detail option gives me more
> data in one shot.  From what I remember, this is a raid1, right?  It
> looks like it has 2 devices and one is still working, but I might be
> wrong. Again --detail will spell out a lot of this explicitly.
> 
>> Now I gather from what I've googled that somehow I've got to get the
>> RAID to reestablish the failed drive by copying from the nonfailed
>> drive. I do believe the hardware is basically OK, and that what I've
>> got is probably a problem due to a power failure  (We've had a lot of
>> these recently) or something transient.
>>
>> (a) How do I do this?
> 
> If a drive has actually failed, then mdadm --remove /dev/md0 /dev/hdxx.
> If the drive has not failed, then you need to fail it first with --fail
> as an option/switch for mdadm.

So presumably the thing to do is 
   mdadm --fail /dev/md0 /dev/hdc3
   mdadm --remove /dev/md0 /dev/hdc3
and then
   mdadm --add/dev/md0 /dev/hdc3

Is the --fail really needed in my case?  the --detail option seems to 
have given /dev/hdc3 the status of "removed" (although it failed to 
mention is was /dev/hdc3).

> 
>> (b) is hda3 the failed drive, or is it the one that's still working?
> 
> That's one of the things mdadm --detail /dev/md0 will tell you.  It will
> list the active drives and the failed drives.

Well.  I'm glad I was paranoid enough to ask.  It seems to be the drive 
that's working.  Glas I didn't try to remove and add in *that* one.

Thanks,

-- hendrik

Reply to:

Follow-Ups:
- Re: Paranoia about DegradedArray
  - From: Hal Vaughan <hal@thresholddigital.com>

References:
- Paranoia about DegradedArray
  - From: Hendrik Boom <hendrik@topoi.pooq.com>
- Re: Paranoia about DegradedArray
  - From: Hal Vaughan <hal@thresholddigital.com>

Prev by Date: RE: intrusion detection
Next by Date: Re: lcap
Previous by thread: Re: Paranoia about DegradedArray
Next by thread: Re: Paranoia about DegradedArray
Index(es):
- Date
- Thread