grub-legacy fails with its root on raid1
Please CC me on reply, since I am not subscribed to the list.
Hello,
just recently I was faced with a non-booting remote box running Debian
6.0 (Squeeze). Lacking access to any kind of console, I was quite
clueless why it wouldn´t come up after what should have been a routine
reboot. I have access to a rescue system though, which, when requested
from the provider, reboots the machine into a live system.
Now I think I have found the problem in grub-legacy trying to boot of
/dev/md0 which indeed is my boot partition:
# Excerpt of grub´s menu.lst
title Debian GNU/Linux, kernel 2.6.32-5-amd64
root (md0) # software raid1 consisting of /dev/sd[ab]1
kernel /vmlinuz-2.6.32-5-amd64 root=/dev/md1 ro
initrd /initrd.img-2.6.32-5-amd64
note that grub´s root is set to (md0) by update-grub. I am not entirely
sure but I thought that wasn´t possible with grub-legacy. I might be
wrong though.
However changing the menu.lst manually to use (hd0,0) as root for grub
finally fixed the machine to boot successfully again, after some very
frustrating attempts to find the cause of the problem.
Now, I honestly don´t know how it has worked before, not knowing if the
menu.lst always pointed grub-legacy to (md0). I haven´t found a
conclusive answer to whether grub-legacy is capable of using raid1 as
root or not, since what documentation I have found online was either
about grub2 or not clearly distinguishing between the two grub versions.
So my questions: Is (the Debian Squeeze version of) grub-legacy capable
of "root (md0)"? If so, why doesn´t it work anymore and if not, why is
update-grub writing a faulty menu.lst?
Having said that, I want to elaborate some more, since I did have some
rebooting problems and fixed them not so long before that. Maybe I broke
something myself and am not aware of it.
Last weekend I wanted to try Xen for virtualization, so I installed
linux-image-2.6.32-5-xen-amd64. I checked the menu.lst and the 1st entry
was the newly installed kernel. So I rebooted and at first the machine
seemed unresponsive and I was already about to reboot the machine into
the rescue system, when suddenly my pings were replied and I could log
in. The uptime command revealed the system had been running for 11
minutes already. But for about 10 minutes there was no echo reply coming
back and no login possible. I thought I missed something for a fully
functional Xen hypervisor system and indeed I did, so I installed
xen-linux-system-2.6-xen-amd64 and rebooted again. Now the machine kept
unresponsive for over an hour and I assumed it would for all eternity,
so I booted into the rescue system to manually change the "default"
entry in menu.lst. /dev/md0 was not available and not knowing how to
change that at the time I mounted /dev/sda1 and /dev/sdb1 directly and
changed both menu.lst files. After that the machine booted with the
normal non-xen-kernel as expected.
I did some more reading on Xen and figured it might not be worth the
trouble to try and get it running on a remote box if at all possible, I
still don´t know. So I installed qemu-kvm, libvirt-bin and virt-manager
and had a go with kvm, decided I wanted to stick with it. Meanwhile I
had also installed gparted to see, if it was capable of resizing my root
partition /dev/md1 (/dev/sda3, /dev/sdb3) without ever trying it.
So the last successful (re)boot was roughly 2 days before I found myself
faced with above non-booting machine:
# zgrep reboot ~log/syslog*
/var/log/syslog.1:Dec 3 20:01:32 shutdown[2521]: shutting down for
system reboot # unsuccessful boot
[...]
/var/log/syslog.3.gz:Dec 1 14:46:13 shutdown[30931]: shutting down for
system reboot
/var/log/syslog.3.gz:Dec 1 14:48:17 /usr/sbin/cron[1614]: (CRON) INFO
(Running @reboot jobs)
[...]
According to my aptitude log I installed gparted including dependencies,
updated libxml2, libxml2-dev, libxml2-utils and python-libxml2. Also I
purged linux-image-2.6-xen-amd64, xen-linux-system-2.6-xen-amd64,
xen-qemu-dm-4.0 and linux-image-2.6.32-5-xen-amd64 including their
dependencies. After the purge action I rebooted and that´s when the
machine wouldn´t come up anymore. And the only reason for not doing so
is grub using (md0) as its root, I believe.
So, have I broken something myself or is there something wrong with
grub-legacy or a third option?
Any help will be greatly appreciated and sorry if this message is a bit
long.
Cheers
Marcus
Reply to: