Strange kernel boot hang
Hi All,
I've been having a nightmare with a box that after a kernel update
refuses to boot. Below is a transcript of what I've tried.
-
Server is a Dell PowerEdge 2800 with 6Gb Ram, LSI Logic Perc 4e/DC RAID
controller.
After a kernel update and server fails to boot.
Messages to the screen state that the kernel and initrd are loading,
then "Booting Kernel" and the following:
Kernel direct mapping tables 100000000 @ 8000-1100
It then hangs.
-
Selecting the old vmlinuz-2.6.18-4-amd64 kernel exhibits the same problem.
-
Booting the server with Gentoo livecd (kernel 2.6.15).
ran e2fsck on all partitions.
By mounting the filesystems and chrooting into the debian personality to
access the system. Like so:
modprobe dm-mod
vgscan
vgchange -a y
mount /dev/sda3 /mnt/gentoo
cd !$
mount /dev/sda1 boot/
for x in usr home var ;do mount /dev/mapper/vg1-$x $x;done
mount -t proc proc proc/
mount -o bind /dev dev/
chroot . /bin/bash
-
With this in place I:
Purged existing kernels 2.6.18-4-amd64 and 2.6.18-5-amd64
Installed 2.6.18-5-amd64 again (which forced an initrd image regeneration)
Reinstalled grub into the boot block:
# grub
> root (hd0,0)
> setup (hd0)
> quit
-
Rebooted, now grub menu is not displayed. Not with a monitor attached
either.
-
Rebooted gentoo disk and removed the serial console settings from
/boot/grub/menu.lst and /etc/inittab.
Rebooted and grub menu still does not display but shows a message. There
is a stream of characters on the screen which fly by as if they are
trying to be drawn. The final message is:
(3-4 non-printable chars)Redirect console code
-
Reinstalled grub (the package) and reinstalled the boot block.
Rebooted, same message.
-
Rebooted into the gentoo disk.
Chrooted in to debian and created a grub floppy with:
grub-floppy /dev/fd0
-
Rebooted with the floppy and get a grub menu.
Manually typing the kernel boot params into grub:
root (hd0,0)
kernel (hd0,0)/vmlinuz ro root=/dev/sda3 console=/dev/tty0
initrd (hd0,0)/initrd.gz
boot
Got same message as for original problem.
Also tried with noapic, nolapic, acpi=off and memmap=exactmap
memmap=3072M@0M
Same error.
-
Thinking it may be a problem with initrd generation I built a custom
kernel which has no need for an initrd (i.e essential modules built in
statically)
aptitude install linux-source-2.6.18 kernel-package ncurses-dev fakeroot
cd /usr/src
tar xvjf linux-source-2.6.18.tar.bz2
ln -sfn linux-source-2.6.18 linux
cp /boot/config linux/.config
cd linux
make menuconfig # set megaraid_mm|mbox & dm-mod as static
make-kpkg clean
date=`date '+%Y%m%d%H%M'`
fakeroot make-kpkg --revision=buildtime${date}.Custom kernel-image
cd ..
dpkg -i linux-image-2.6.18-1_buildtime${date}.Custom_amd64.deb
reboot
same error! this took bloody ages too!
-
Going on a hunch with Alex Butcher we flashed the BIOS to version A06.
Same error.
-
As a final ditch attempt to get services back up I ran the following
commands before the chroot above.
mount -t devpts devpts dev/pts
chroot in then start up all services:
cd /etc/rc2.d
for x in S* ;do ./$x start ;done
-
Will attempt to carry on with this....
My feelings are that there is a problem with either the RAID or
filesystem on /boot.
I'd like to stick the kernel images on to a USB memory stick and try to
boot with that. That should confirm if there is a disk level problem.
If it is then we will need to migrate mail onto another box, destroy and
rebuild the array, and attempt to reinstall a fresh copy.
It might be worth copying the kernels (system image) to another box to
see if it boots elsewhere.
Any ideas are greatly appreciated.
Matt
Reply to: