[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Strange kernel boot hang



Hi All,

I've been having a nightmare with a box that after a kernel update
refuses to boot. Below is a transcript of what I've tried.
-

Server is a Dell PowerEdge 2800 with 6Gb Ram, LSI Logic Perc 4e/DC RAID
controller.

After a kernel update and server fails to boot.

Messages to the screen state that the kernel and initrd are loading,
then "Booting Kernel" and the following:

Kernel direct mapping tables 100000000 @ 8000-1100

It then hangs.

-
Selecting the old vmlinuz-2.6.18-4-amd64 kernel exhibits the same problem.

-
Booting the server with Gentoo livecd (kernel 2.6.15).

ran e2fsck on all partitions.

By mounting the filesystems and chrooting into the debian personality to
access the system. Like so:

modprobe dm-mod
vgscan
vgchange -a y
mount /dev/sda3 /mnt/gentoo
cd !$
mount /dev/sda1 boot/
for x in usr home var ;do mount /dev/mapper/vg1-$x $x;done
mount -t proc proc proc/
mount -o bind /dev dev/
chroot . /bin/bash

-
With this in place I:

Purged existing kernels 2.6.18-4-amd64 and 2.6.18-5-amd64
Installed 2.6.18-5-amd64 again (which forced an initrd image regeneration)
Reinstalled grub into the boot block:

# grub
> root (hd0,0)
> setup (hd0)
> quit

-
Rebooted, now grub menu is not displayed. Not with a monitor attached
either.

-
Rebooted gentoo disk and removed the serial console settings from
/boot/grub/menu.lst and /etc/inittab.

Rebooted and grub menu still does not display but shows a message. There
is a stream of characters on the screen which fly by as if they are
trying to be drawn. The final message is:

(3-4 non-printable chars)Redirect console code

-
Reinstalled grub (the package) and reinstalled the boot block.
Rebooted, same message.

-
Rebooted into the gentoo disk.
Chrooted in to debian and created a grub floppy with:

grub-floppy /dev/fd0

-
Rebooted with the floppy and get a grub menu.
Manually typing the kernel boot params into grub:

root (hd0,0)
kernel (hd0,0)/vmlinuz ro root=/dev/sda3 console=/dev/tty0
initrd (hd0,0)/initrd.gz
boot

Got same message as for original problem.

Also tried with noapic, nolapic, acpi=off and memmap=exactmap
memmap=3072M@0M

Same error.
-
Thinking it may be a problem with initrd generation I built a custom
kernel which has no need for an initrd (i.e essential modules built in
statically)

aptitude install linux-source-2.6.18 kernel-package ncurses-dev fakeroot
cd /usr/src
tar xvjf linux-source-2.6.18.tar.bz2
ln -sfn linux-source-2.6.18 linux
cp /boot/config linux/.config
cd linux
make menuconfig		# set megaraid_mm|mbox & dm-mod as static
make-kpkg clean
date=`date '+%Y%m%d%H%M'`
fakeroot make-kpkg --revision=buildtime${date}.Custom kernel-image
cd ..
dpkg -i linux-image-2.6.18-1_buildtime${date}.Custom_amd64.deb

reboot

same error! this took bloody ages too!
-
Going on a hunch with Alex Butcher we flashed the BIOS to version A06.

Same error.

-
As a final ditch attempt to get services back up I ran the following
commands before the chroot above.

mount -t devpts devpts dev/pts

chroot in then start up all services:

cd /etc/rc2.d
for x in S* ;do ./$x start ;done


-

Will attempt to carry on with this....

My feelings are that there is a problem with either the RAID or
filesystem on /boot.
I'd like to stick the kernel images on to a USB memory stick and try to
boot with that. That should confirm if there is a disk level problem.

If it is then we will need to migrate mail onto another box, destroy and
rebuild the array, and attempt to reinstall a fresh copy.

It might be worth copying the kernels (system image) to another box to
see if it boots elsewhere.

Any ideas are greatly appreciated.

Matt


Reply to: