Re: mdadm doing strange things

To: debian-user@lists.debian.org
Subject: Re: mdadm doing strange things
From: Alan Chandler <alan@chandlerfamily.org.uk>
Date: Sun, 20 Jun 2010 08:35:34 +0100
Message-id: <[🔎] 4C1DC4C6.5070306@chandlerfamily.org.uk>
In-reply-to: <[🔎] 201006192115.16300.reidac@bellatlantic.net>
References: <[🔎] 4C1D0A6B.60302@chandlerfamily.org.uk> <[🔎] 201006192115.16300.reidac@bellatlantic.net>

On 20/06/10 02:15, Andrew Reid wrote:

On Saturday 19 June 2010 14:20:27 Alan Chandler wrote:

[ Details elided ]

HOWEVER (the punch line).  When this system booted, it was not the old
reverted one but how it was before I started this cycle. In other words
it looked as though the disk which I had failed and removed was being used

If I did mdadm --detail /dev/md1 (or any of the other devices) it shows
/dev/sdb as the only device on the raid pair.  To sync up again I am
having to add in the various /dev/sda partitions.

SO THE QUESTION IS.  What went wrong.  How does a failed device end up
being used to build the operational arrays, and the other devices end up
not being included.


   My understanding of how mdadm re-arranges the array (including for
failures, etc.) is that it writes metadata into the various partitions,
so I agree with you that this is weird -- I would have expected the
RAID array to come up with the sda devices as the only devices present.

   There are two things I can think of, neither quite right, but maybe
they'll motivate someone else to figure it out:

  (1) Device naming can be tricky when you're unplugging drives.
Maybe the devices now showing up as "sdb" actually are the original
"sda" devices.  Can you check UUIDs?  This explanation also requires
that you didn't actually revert the disk, you only thought you did,
but then didn't catch it because the conjectural device-renaming
convinced you that the RAID was being weird.

Of course that was my first thought. But I was doing this via SSH froman machine, so the terminal screen contents survived the power down. Itwas clear what I had done and which disks had failed etc


  (2) How did you revert the root partition?  If you copied all the
files, then I have nothing else to add.


Yes I did a file copy (using rsync -aH)

...


   Also, what happened to /etc/mdadm/mdadm.conf on the reverted root
partition?  Is it nonexistent on the one you're now booting from?
There's potential for confusion there also, although I think the
initramfs info will suffice until the next kernel update.

This point is a possiblility as I didn't check the mdadm.conf file, butthe initramfs was the same one throughout.

I got into more trouble, because in order to correct stuff (but beforethe failed disk had even started to be resynced - I had asked it, but amuch bigger partition was in the processes, so it hadn't started) Ipowered down, removed both disks from the system and put an old diskback and powered up copied some files across to a third disk powereddown and replaced the two raided disks back. When I powered up again,it switched again and said the two disks were in sync on the partitionsthat hadn't started. This left the file system in an unusable state.

Fortunately the more important big partition that was only partiallysynced carried on syncing in the same configuration (although I believeit started again from scratch rather than carrying on from where it leftoff).

What I think was happening was that the BIOS was changing the boot orderwhenever I changed the disks and I then ended up booting from anincorrectly synced partition.


--
Alan Chandler
http://www.chandlerfamily.org.uk

Reply to:

References:
- mdadm doing strange things
  - From: Alan Chandler <alan@chandlerfamily.org.uk>
- Re: mdadm doing strange things
  - From: Andrew Reid <reidac@bellatlantic.net>

Prev by Date: Re: mdadm doing strange things
Next by Date: Well, this is interesting... (was Re: Torrents killing my connection)
Previous by thread: Re: mdadm doing strange things
Next by thread: Downgraded Flash Non-Free 32 to Backports
Index(es):
- Date
- Thread