[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Can't reboot after power failure (RAID problem?)

On Tue, Feb 1, 2011 at 10:38 AM, David Gaudine
<davidg@alcor.concordia.ca> wrote:
> On 11-01-31 8:47 PM, Andrew Reid wrote:
>> On Monday 31 January 2011 10:51:04 davidg@alcor.concordia.ca wrote:
>>> I posted in a panic and left out a lot of details.  I'm using Squeeze,
>>> and
>>> set up the system about a month ago, so there have been some upgrades.  I
>>> wonder if maybe the kernel or Grub was upgraded and I neglected to
>>> install
>>> Grub again, but I would expect it to automatically be reinstalled on at
>>> least the first disk.  If I remove either disk I get the same error
>>> message.
>>> I did look at /proc/cmdline.  It shows the same uuid for the root device
>>> as in the menu, so that seems to prove it's an MD device that isn't ready
>>> since my boot and root partitions are each on MD devices.  /proc/modules
>>> does show md_mod.
>>   What about the actual device?  Does /dev/md/0 (or /dev/md0, or whatever)
>> exist?
>>   If the module is loaded but the device does not exist, then it's
>> possible
>> there's a problem with your mdadm.conf file, and the initramfs doesn't
>> have the array info in it, so it wasn't started.
>>   The easy way out is to boot from a rescue disk, fix the mdadm.conf
>> file, rebuild the initramfs, and reboot.
>>   The Real Sysadmin way is to start the array by hand from inside
>> the initramfs.  You want "mdadm -A /dev/md0" (or possibly
>> "mdadm -A -u<your-uuid>") to start it, and once it's up, ctrl-d out
>> of the initramfs and hope.  The part I don't remember is whether or
>> not this creates the symlinks in /dev/disk that your root-fs-finder
>> is looking for.
>>   It may be better to boot with "break=premount" to get into the
>> initramfs in a more controlled state, instead of trying to fix it
>> in the already-error-ed state, assuming you try the initramfs
>> thing at all.
>>   And further assuming that the mdadm.conf file is the problem,
>> which was pretty much guesswork on my part...
>>                                        -- A.
> I found the problem.  You're right, mdadm.conf was the problem, which is
> amazing considering that I had previously restarted without changing
> mdadm.conf.  I edited it in the initramfs, then did "mdadm -A /dev/md0 as
> you suggested and control-d worked.  I assume I'll still have to rebuild the
> initramfs; I might need handholding, but I'll google first.
> I think what went wrong might interest some people, since it answers a
> question I previously raised under the subject
> RAID1 with multiple partitions
> There was no concensus so I made the wrong choice.
> The cause of the problem is, I set up my system under a temporary hostname
> and then changed the hostname.  The hostname appeared at the end of each
> ARRAY line in mdadm.conf, and I didn't know whether I should change it there
> because I didn't know if whether it has to match the current hostname in the
> current /etc/host, has to match the current hostname, or is just a
> meaningless label.  I changed it to the new hostname at the same time that I
> changed the hostname, then shut down and restarted.  It booted fine.  I did
> the same thing on another computer, and I'm sure I restarted that one
> successfully several times.  So, I foolishly thought I was safe.  After the
> power failure it wouldn't boot.  After following your advice I was
> sufficiently inspired to edit mdadm.conf back to the original hostname,
> mount my various md's, and control-d.  I assume I'll have to do that every
> time I boot until I rebuild the initramfs.
> Thank you very much.  I'd already recovered everything from a backup, but I
> needed to find the solution or I'd be "afraid to raid" in future.

If you'd like to have homehost in mdadm.conf be the same as the
hostname, you could break your boot in initramfs and assemble the
array with
mdadm --assemble /dev/mdX --homehost=whatever --update=homehost /dev/sdXX.....

Reply to: