[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: issue with mdadm and mirroring drives



26/01/2012 23:07, Joey L wrote:

>>
>>>
>>> When I boot the system with all drives in, I get the superflous error.
>>>
>>> So the only way to boot is only to put in /dev/sdc alone and boot.
>>> when i get to a linux prompt, I insert the second drive into the system /dev/sdd
>>>
>>> To sync them, /dev/sdd has already failed, so i run
>>> sfdisk -d /dev/sdc | sfdisk /dev/sdd
>>> ** i get an error that nothing has changed - so I run it with the
>>> --force command to get the partitions identical like:
>>> sfdisk -d /dev/sdc | sfdisk --force /dev/sdd
>>
>> Why do you do that ? You are forcing the partitioning of the first disk
>> onto the second, this could work at raid creation time but isn't the
>> proper procedure to re-add a failed member to an array. You don't have
>> to "sync" the data and even less the disk partitioning manually prior to
>> re-adding it to the raid. mdadm will handle the resync.
> 
> I did not mean to do this - i think this is my main issue - i can not
> zero out drives with mdadm.
> It gives an error that i can not get past - so i use force option.

It would be useful to know the exact mdadm error message when you try to
do "mdadm --zero-superblock" on the failed raid member. You need to
remove ("mdadm --remove") the failed member first, or work when the
array isn't started.


> Do you recommend any other utility to zero out drives - that will make
> them blank ?

sfdisk doesn't "zero out" the disk, with option "-d" it merely dump the
partition table in a format that can be reused to copy over another
disk. It's completely possible to restore the target disk to it's
previous state with the correct information (ideally the output of
"sfdisk -O" , but testdisk could do the trick). Everything is on the
disk, but there is no more "map" to retrieve it.
To "zero out" a disk one could use dd to fill the disk with zeros, or
random data if security is important, badblock in write mode can do that
too.

The important part is that sfdisk has no idea of anything such as
metadata, bootloaders, file-system, even extended partitions can be
clobbered by the "-d" magic trick. It works reliably only for empty
primary partitions, preferably small, on disks with msdos disk-labels,
preferably identicals (same geometry), using the "-D" option.

So it's fragile enough that you don't want to mess with the "--force"
option when sfdisk complains, it can mess things easily enough as it is
;-) .

> I think it maybe an issue with my working drive - i think the
> partitions are screwed up there - and when sfdsik copies - it does not
> copy correctly.
> But i did boot with knoppix and went to fdisk and deleted the
> partitions - but still had issues with sfdisk.
> An utilty u recommend ??
> 

After clearing previous raid metadata if any, I usually use parted to
create new disklabels and empty (non formatted) partitions (if not using
the whole disk) before creating a raid.
Repeat on all disks to be included in the raid with the same values.

See "help mklabel" and "help mkpart" in parted, or "man parted".

When done create the raid (with 1.2 metadata), start it, and format the
raid device with mkfs. Nothing else is needed, no flag or file-system
type changing to "fd".

Even better, clear the disks from raid metadata, create new disklabels,
and let Debian installer take care of the rest. Yes, it means
reinstalling the system, but it's by far the easiest/safest/fastest
option. Personal data can be restored from a backup quicker than one can
read parted manual !

>>>
>>>
>>> Model: ATA ST31000340AS (scsi)
>>> Disk /dev/sdc: 1000GB
>>> Sector size (logical/physical): 512B/512B
>>> Partition Table: msdos
>>>
>>> Number  Start   End     Size    Type     File system     Flags
>>>  1      1049kB  996GB   996GB   primary  ext3            raid
>>>  2      996GB   1000GB  4204MB  primary  linux-swap(v1)
>>>
>>>
>>> Model: ATA ST31000528AS (scsi)
>>> Disk /dev/sdd: 1000GB
>>> Sector size (logical/physical): 512B/512B
>>> Partition Table: msdos
>>>
>>> Number  Start   End     Size    Type     File system     Flags
>>>  1      1049kB  996GB   996GB   primary  ext4            raid
>>>  2      996GB   1000GB  4204MB  primary  linux-swap(v1)
>>>
>>
>> Why do you have file-systems on your partitions ? Only the "md" raid
>> devices should be formatted with a file-system, not the underlying
>> partitions !
>> I would be curious to know what "fsck" says about your md devices (fsck
>> /dev/md0 for example) ?
> 
> Again - i think sfdisk copy from working drive is causing this issue.
> Can i go into fdsik to fix ?

Having file-systems on the partitions before creating the raid doesn't
help. If you want to (try to) "fix" things at this stage, run "e2fsck
-cc" on the raid device (unmounted), and then "resize2fs" . But this
will be painfully slow with 1GB disks, think more than a day...

Did you try to run "fsck" on the raid device (from a live-cd possibly) ?


> Change the labels ? i just think they are labeled incorrectly.

Talking about disklabel or partition labels ? Creating a new disklabel
means loosing the data on it, and starting over... Partitions labels
have no relevance here, grub uses UUID's/mdUUID's.


>> I am starting to think that you have much lower level problems. When you
>> created this system, where the disks "clean", or did you use "sfdisk"
>> over existing formatted partitions ? Where the disks used in a raid
>> before ? If this is the case you should consider backing up and
>> recreating the raid properly.
>>
>> Also, if one disk is repeatedly dropping from the raid array, consider
>> looking at the "smart" values, it may be dying.
>>
> 
> What smart values are u refering to ? is that a utility ?
> Again - i think i need a clean utility or at least a procedure other
> then --zero option of mdadm to clear out the drive.

"smart" is a self-check utility to diagnose disk health. It isn't always
100% accurate but can provide hints that a disk is about to fail. If you
want a graphical interface use "gsmartcontrol", otherwise read about
"man smartctl" (needs package "smartmontools" installed).

"smartctl -l error -l selftest -H /dev/sd?" can get you started, change
"?" with actual disk address (i.e. /dev/sdd).



Reply to: