Bug#508820: linux-image-2.6.26-1-mckinley: MCA and panic after bringing up loopback interface on ia64
On Tue, Dec 16, 2008 at 10:08:59AM -0700, Aaron D. Johnson wrote:
> dann frazier writes:
> > I've got a theory - can you search the /var/log/kern.log* files on
> > this guest for any Oops messages?
>
> No Oopses going back to 3 Dec:
> ajohnso2@spielplatz:~$ sudo zgrep -i oops /var/log/kern.log*
> ajohnso2@spielplatz:~$ sudo gzip -dc /var/log/kern.log.6.gz | head -n 1
> Dec 3 06:26:12 spielplatz kernel: [15525239.819366] postgres(22142): floating-point assist fault at ip 40000000003de402, isr 0000040000000008
> ajohnso2@spielplatz:~$
>
> Countless floating-point assist fault messages, though. It seems that
> PostgreSQL needs some help in this department.
>
> > Do you recall experiencing a hang during your kernel upgrade?
>
> I remember a hang on shutdown for some system during the last week,
> but nothing during the kernel package upgrade proper.
>
> > I'm wondering if there was an oops at the time you upgraded your
> > kernel package. Also, can you mount your efi partition and capture
> > the md5sums of the files under /boot/efi/efi/debian?
>
> ajohnso2@spielplatz:~$ sudo mount -v -t vfat -o ro /dev/sda1 /mnt
> ajohnso2@spielplatz:~$ md5sum /mnt/efi/debian/*
> 9fa2639fa5dca1521df76c7c254f4e04 /mnt/efi/debian/elilo.conf
> 5bec2375858e01c4590976f3fb479a3c /mnt/efi/debian/elilo.efi
> f6d26c846defcbb6a255365b32205e69 /mnt/efi/debian/initrd.img
> f43e07c02fff08489e5d1f60dc0046ae /mnt/efi/debian/initrd.img.old
> 35a0f1cd6e79fc7ffd93ca1dddb5df01 /mnt/efi/debian/readme.txt
> 384b24d661e30ca549569954ab9dc3ae /mnt/efi/debian/vmlinuz
> 67a9622f681abd91cc4710da8894b743 /mnt/efi/debian/vmlinuz.old
> ajohnso2@spielplatz:~$
>
> > If my theory is correct, you may be able to get back up and running
> > by booting an older kernel (if you have one), running 'elilo', then
> > booting back into the 2.6.26-11 kernel.
>
> OK, so that worked. What change did re-running elilo make? Based on
> the MD5sums, there are new initrd and vmlinuz files. Seems like
> installing kernel-image-2.6.26-1-mckinley should have done that in its
> postinst script.
Here's what I think happened:
- Running 2.6.26-8
- Upgraded to 2.6.26-11
- unpacked 2.6.26-11
- generated initramfs
- called elilo
- elilo loads modules it needs to mount EFI partition,
but the modules available are now for 2.6.26-11 and
are incompatible with 2.6.26-8.
- system tries to mount efi partition and hangs due to
incompatible modules - kernel/initrd in the efi partition
is now out of date with respect to the files in /boot
- system boots 2.6.26-8 again
- initramfs loads, works fine (still using 2.6.26-8 initramfs)
- system mounts root
- system starts loading modules from the root partition (which
are now 2.6.26-11 modules), and does bad things.
The bug would therefore be that we created a kernel with the same
abiname that was actually incompatible with the modules from an
earlier release.
> What happens to the poor user who doesn't know to re-run elilo? (Not
> that I expect there are too many "poor users" running ia64 systems.)
Unfortunately, I don't know that there's anyway to retroactively solve
this problem. The cat is out of the bag, as they say.
It would be a nice safety procedure to make sure the modules we need
are loaded before we unpack the new modules - i.e., in the
preinst. One way to do this would be to call 'elilo' in the preinst.
Savy users can configure their systems to do this themselves by adding
a preinst hook in /etc/kernel-img.conf.
--
dann frazier
Reply to: