Re: raid problem

To: debian-user@lists.debian.org
Subject: Re: raid problem
From: François Patte <francois.patte@mi.parisdescartes.fr>
Date: Sat, 30 Nov 2013 19:48:43 +0100
Message-id: <[🔎] 529A330B.60506@mi.parisdescartes.fr>
In-reply-to: <[🔎] 20131130115635.GC26582@aym.net2.nerim.net>
References: <[🔎] 529918A1.7020505@mi.parisdescartes.fr> <[🔎] 20131130115635.GC26582@aym.net2.nerim.net>

Le 30/11/2013 12:56, Andre Majorel a écrit :
> On 2013-11-29 23:43 +0100, François Patte wrote:
> 
>> I have a problem with 2 raid arrays: I have 2 disks (sdc and sdd) in
>> raid1 arrays.
>>
>> One disk (sdc) failed and I replaced it by a new one. Copying the
>> partition table from sdd disk using sfdisk:
>>
>> sfdisk -d /dev/sdd | sfdisk /dev/sdc
>>
>> then I "added" the 2 partitions (sdc1 and sdc3) to the arrays md0 and md1:
>>
>> mdadm --add /dev/md0 /dev/sdc1
>>
>> mdadm --add /dev/md1 /dev/sdc3
>>
>> There were no problem with the md0 array:
>>
>>
>> cat /proc/mdstat gives:
>>
>> md0 : active raid1 sdc1[1] sdd1[0]
>>       1052160 blocks [2/2] [UU]
>>
>>
>> But for the md1 array, I get:
>>
>> md1 : active raid1 sdc3[2](S) sdd3[0]
>>       483138688 blocks [2/1] [U_]
>>
>> What is the problem? And how can I recover a correct md1 array?
> 
> The root of your problem would be that /dev/sdc3 is considered
> spare, not active. Not sure why.

Thank you for answering

> 
> Guess #1 : before physically changing the disks, you forgot
>   mdadm /dev/md1 --fail   /dev/sdc3
>   mdadm /dev/md1 --remove /dev/sdc3

No, I didn't!

> 
> Guess #2 : maybe there were I/O errors during the add. How far
> did the sync go ? Run smartctl -d ata -A /dev/sdc3 and look for
> non-zero raw values for Reallocated_Sector_Ct and
> Current_Pending_Sector. What does badblocks /dev/sdc3 say ?

No non-zero values for these two... no badblocks on sdc3

> 
> Guess #3 : it's a software hiccup and all /dev/sdc3 needs is to
> be removed from /dev/md1 and re-added.

I tried without any success...

But something is strange: there are some badblocks on sdd3! logwatch
returs errors on sdd disk:

md/raid1:md1: sdd: unrecoverable I/O read error for block 834749 ...:  3
Time(s)
res 41/40:00:6f:56:61/00:00:32:00:00/40 Emask 0x409 (media error) <F>
...:  24 Time(s)
sd 5:0:0:0: [sdd]  Add. Sense: Unrecovered read error - auto reallocat
...:  6 Time(s)
sd 5:0:0:0: [sdd]  Sense Key : Medium Error [current] [descr ...:  6 Time(s)

mdmonitor returns:

This is an automatically generated mail message from mdadm
running on dipankar

A FailSpare event had been detected on md device /dev/md1.

It could be related to component device /dev/sdc3.


If I summarize the situation: the faulty disk (with badblocks) is sdd3,
but it is the only active disk in the md1 array and I can fully access
the data of this disk which is normally mounted at boot time, while the
disk sdc3 has no badblocks and is declared as faulty by mdadm....!!

I don't understand something!

Anyway. I can delete this array and create a new one from scratch (after
replacing the faulty disk).

Is it enough to run these commands:

mdadm --zero-superblock /dev/sdc3

mdadm --zero-superblock /dev/sdd3

Or do I have also to modify the /etc/mdadm/mdadm.conf file?

Thank you for your answer.


-- 
François Patte
UFR de mathématiques et informatique
Laboratoire CNRS MAP5, UMR 8145
Université Paris Descartes
45, rue des Saints Pères
F-75270 Paris Cedex 06
Tél. +33 (0)1 8394 5849
http://www.math-info.univ-paris5.fr/~patte

Attachment: signature.asc
Description: OpenPGP digital signature

Reply to:

References:
- raid problem
  - From: François Patte <francois.patte@mi.parisdescartes.fr>
- Re: raid problem
  - From: Andre Majorel <aym-naibed@teaser.fr>

Prev by Date: Re: debian-multimedia.org
Next by Date: Share VPN connection
Previous by thread: Re: raid problem
Next by thread: apparently they somehow have their adler32 cake as part of Debian Linux ...
Index(es):
- Date
- Thread