[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: mdadm gives segmentatin fault on wheezy. RAID array now incomplete.



On Wed, 09 Oct 2013 10:59:48 -0600, Shane Johnson wrote:

> On Wed, Oct 9, 2013 at 10:50 AM, Hendrik Boom <hendrik@topoi.pooq.com>
> wrote:
>> I ran
>>
>> mdadm /dev/md1 --add /dev/sdd2
>>
>> and got a segmentation fault.
>>
>>
>> april:/farhome/hendrik# cat /proc/mdstat Personalities : [raid1]
>> md1 : active raid1 sdb2[1]
>>       2391295864 blocks super 1.2 [2/1] [_U]
>>
>> md0 : active raid1 sda4[0] sdc4[1]
>>       706337792 blocks [2/2] [UU]
>>
>> unused devices: <none>
>> april:/farhome/hendrik# mdadm /dev/md1 --add /dev/sdd2 Segmentation
>> fault april:/farhome/hendrik#
>>
>>
>> /dev/sdd2 used to be part of the /dev/md1 RAID1 array, but it went bad,
>> presumably becaues of a hard reset.
>>
>> I did a
>>
>> mdadm /dev/md1 --fail /dev/sdd2 --remove /dev/sdd2
>>
>> which appeared to work correctly, and after that
>>
>> april:/farhome/hendrik# cat /proc/mdstat Personalities : [raid1]
>> md1 : active raid1 sdb2[1]
>>       2391295864 blocks super 1.2 [2/1] [_U]
>>
>> md0 : active raid1 sda4[0] sdc4[1]
>>       706337792 blocks [2/2] [UU]
>>
>> unused devices: <none>
>> april:/farhome/hendrik# mdadm /dev/md1 --add /dev/sdd2 Segmentation
>> fault april:/farhome/hendrik#
>>
>>
>> What now?
>>
>> -- hendrik
>>
>>
>> --
>> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a
>> subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>> Archive: [🔎] l341hd$6pp$1@ger.gmane.org">http://lists.debian.org/[🔎] l341hd$6pp$1@ger.gmane.org
>>
> Hendrik,
> You might look in the logs to see if they give more detail

For some reason my logs seem mostley to have stopped acceptig messages 
last May, approximately the time when I upgraded to wheezy.

Apparently (discussed in another thread) the upgrade seems to have 
mmisconfigured my logging options.

Do you happen to know which log the mdadm messages are likely to be in?  
I might be able to experiment with the log configuration until something 
shows up.

Still, if something was wrong with /dev/sdd2, I might expect a message, 
but I wouldn't expect mdadm to segfault.

 
> otherwise I
> would try removing the failed device and replacing it with another that
> is as close as possible to the same size and see if you can add it.

I don't really have a spare drive of that size around.  And it would take 
weeks to test a new one before I could get around to trying it out.  The 
failed drive was so tested only a few months ago, and it passed. 

I test all my drives with a full-surface write/read test using badblocks.  
Drives that fail are promptly returned to the vendor.

> I have also seen on the duct tape raid I had for a while where I would
> have to power cycle the box in order for it to reactivate the flaky
> drive.

A duct-tape RAID?  Was this a hardware RAID, where the hardware takes 
care of it all,  or a software-based mdadm RAID?  Or is this some 
hitherto undiscovered use for duct tape?

The last thing I want is to discover my system is unbootable.  Unlikely, 
because the MBR it boots from is on a different drive, and the entire 
running OS is on a completely separate RAID, on different physical drives.
The next-to last thing is to find that at boot time it fails to assemble 
the defective RAID at all.  Assembling it as a usable defective single-
drive RAID array would be OK.  (But that's where I'm at now).

I guess the next thing is to find my least-recently-used backup drive and 
put a new fresh backup on it before I do anything dangerous.

>  Just a couple of suggestions.
> 
> Shane

Thanks, but I'm not out of the woods yet.

I've found a relevant bug report:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=718896

And the version of mdadm available in Debian isn't the most recent.
Version 3.2.6 has been announced on http://git.neil.brown.name/?
p=mdadm.git;a=blob;f=ANNOUNCE-3.2.6;h=f5cfd4920576fba77c7162c331b87873f8bfa5ef;hb=HEAD

The second item on its git log says

0d478e2 mdadm: Fix Segmentation fault.

I have no idea whether this is the same segmentation fault I'm runnin 
into.  But it might be.

And apparently upstream development is already at version mdadm=3.3.  Not 
sure if I can really wait for Debian to migrate the relevant bug fix into 
wheezy to rebuild my RAID array.  Assuming there is a relevant bug fix, 
of course.

-- hendrik


-- hendrik



Reply to: