[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Can't reboot after power failure (RAID problem?)



On 11-01-31 8:47 PM, Andrew Reid wrote:
On Monday 31 January 2011 10:51:04 davidg@alcor.concordia.ca wrote:
I posted in a panic and left out a lot of details.  I'm using Squeeze, and
set up the system about a month ago, so there have been some upgrades.  I
wonder if maybe the kernel or Grub was upgraded and I neglected to install
Grub again, but I would expect it to automatically be reinstalled on at
least the first disk.  If I remove either disk I get the same error
message.

I did look at /proc/cmdline.  It shows the same uuid for the root device
as in the menu, so that seems to prove it's an MD device that isn't ready
since my boot and root partitions are each on MD devices.  /proc/modules
does show md_mod.
   What about the actual device?  Does /dev/md/0 (or /dev/md0, or whatever)
exist?

   If the module is loaded but the device does not exist, then it's possible
there's a problem with your mdadm.conf file, and the initramfs doesn't
have the array info in it, so it wasn't started.

   The easy way out is to boot from a rescue disk, fix the mdadm.conf
file, rebuild the initramfs, and reboot.

   The Real Sysadmin way is to start the array by hand from inside
the initramfs.  You want "mdadm -A /dev/md0" (or possibly
"mdadm -A -u<your-uuid>") to start it, and once it's up, ctrl-d out
of the initramfs and hope.  The part I don't remember is whether or
not this creates the symlinks in /dev/disk that your root-fs-finder
is looking for.

   It may be better to boot with "break=premount" to get into the
initramfs in a more controlled state, instead of trying to fix it
in the already-error-ed state, assuming you try the initramfs
thing at all.

   And further assuming that the mdadm.conf file is the problem,
which was pretty much guesswork on my part...

					-- A.

I found the problem. You're right, mdadm.conf was the problem, which is amazing considering that I had previously restarted without changing mdadm.conf. I edited it in the initramfs, then did "mdadm -A /dev/md0 as you suggested and control-d worked. I assume I'll still have to rebuild the initramfs; I might need handholding, but I'll google first.

I think what went wrong might interest some people, since it answers a question I previously raised under the subject
RAID1 with multiple partitions
There was no concensus so I made the wrong choice.

The cause of the problem is, I set up my system under a temporary hostname and then changed the hostname. The hostname appeared at the end of each ARRAY line in mdadm.conf, and I didn't know whether I should change it there because I didn't know if whether it has to match the current hostname in the current /etc/host, has to match the current hostname, or is just a meaningless label. I changed it to the new hostname at the same time that I changed the hostname, then shut down and restarted. It booted fine. I did the same thing on another computer, and I'm sure I restarted that one successfully several times. So, I foolishly thought I was safe. After the power failure it wouldn't boot. After following your advice I was sufficiently inspired to edit mdadm.conf back to the original hostname, mount my various md's, and control-d. I assume I'll have to do that every time I boot until I rebuild the initramfs.

Thank you very much. I'd already recovered everything from a backup, but I needed to find the solution or I'd be "afraid to raid" in future.

David


Reply to: