[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: mdadm doing strange things



On 20/06/10 02:15, Andrew Reid wrote:
On Saturday 19 June 2010 14:20:27 Alan Chandler wrote:

[ Details elided ]

HOWEVER (the punch line).  When this system booted, it was not the old
reverted one but how it was before I started this cycle. In other words
it looked as though the disk which I had failed and removed was being used

If I did mdadm --detail /dev/md1 (or any of the other devices) it shows
/dev/sdb as the only device on the raid pair.  To sync up again I am
having to add in the various /dev/sda partitions.

SO THE QUESTION IS.  What went wrong.  How does a failed device end up
being used to build the operational arrays, and the other devices end up
not being included.

   My understanding of how mdadm re-arranges the array (including for
failures, etc.) is that it writes metadata into the various partitions,
so I agree with you that this is weird -- I would have expected the
RAID array to come up with the sda devices as the only devices present.

   There are two things I can think of, neither quite right, but maybe
they'll motivate someone else to figure it out:

  (1) Device naming can be tricky when you're unplugging drives.
Maybe the devices now showing up as "sdb" actually are the original
"sda" devices.  Can you check UUIDs?  This explanation also requires
that you didn't actually revert the disk, you only thought you did,
but then didn't catch it because the conjectural device-renaming
convinced you that the RAID was being weird.

Of course that was my first thought. But I was doing this via SSH from an machine, so the terminal screen contents survived the power down. It was clear what I had done and which disks had failed etc


  (2) How did you revert the root partition?  If you copied all the
files, then I have nothing else to add.

Yes I did a file copy (using rsync -aH)

...


   Also, what happened to /etc/mdadm/mdadm.conf on the reverted root
partition?  Is it nonexistent on the one you're now booting from?
There's potential for confusion there also, although I think the
initramfs info will suffice until the next kernel update.


This point is a possiblility as I didn't check the mdadm.conf file, but the initramfs was the same one throughout.


I got into more trouble, because in order to correct stuff (but before the failed disk had even started to be resynced - I had asked it, but a much bigger partition was in the processes, so it hadn't started) I powered down, removed both disks from the system and put an old disk back and powered up copied some files across to a third disk powered down and replaced the two raided disks back. When I powered up again, it switched again and said the two disks were in sync on the partitions that hadn't started. This left the file system in an unusable state.

Fortunately the more important big partition that was only partially synced carried on syncing in the same configuration (although I believe it started again from scratch rather than carrying on from where it left off).

What I think was happening was that the BIOS was changing the boot order whenever I changed the disks and I then ended up booting from an incorrectly synced partition.

--
Alan Chandler
http://www.chandlerfamily.org.uk


Reply to: