[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#508820: linux-image-2.6.26-1-mckinley: MCA and panic after bringing up loopback interface on ia64



On Tue, Dec 16, 2008 at 10:08:59AM -0700, Aaron D. Johnson wrote:
> dann frazier writes:
> > I've got a theory - can you search the /var/log/kern.log* files on
> > this guest for any Oops messages?
> 
> No Oopses going back to 3 Dec:
>     ajohnso2@spielplatz:~$ sudo zgrep -i oops /var/log/kern.log*
>     ajohnso2@spielplatz:~$ sudo gzip -dc /var/log/kern.log.6.gz | head -n 1
>     Dec  3 06:26:12 spielplatz kernel: [15525239.819366] postgres(22142): floating-point assist fault at ip 40000000003de402, isr 0000040000000008
>     ajohnso2@spielplatz:~$ 
> 
> Countless floating-point assist fault messages, though.  It seems that
> PostgreSQL needs some help in this department.
> 
> > Do you recall experiencing a hang during your kernel upgrade?
> 
> I remember a hang on shutdown for some system during the last week,
> but nothing during the kernel package upgrade proper.
> 
> > I'm wondering if there was an oops at the time you upgraded your
> > kernel package.  Also, can you mount your efi partition and capture
> > the md5sums of the files under /boot/efi/efi/debian?
> 
> ajohnso2@spielplatz:~$ sudo mount -v -t vfat -o ro /dev/sda1 /mnt
> ajohnso2@spielplatz:~$ md5sum /mnt/efi/debian/*
> 9fa2639fa5dca1521df76c7c254f4e04  /mnt/efi/debian/elilo.conf
> 5bec2375858e01c4590976f3fb479a3c  /mnt/efi/debian/elilo.efi
> f6d26c846defcbb6a255365b32205e69  /mnt/efi/debian/initrd.img
> f43e07c02fff08489e5d1f60dc0046ae  /mnt/efi/debian/initrd.img.old
> 35a0f1cd6e79fc7ffd93ca1dddb5df01  /mnt/efi/debian/readme.txt
> 384b24d661e30ca549569954ab9dc3ae  /mnt/efi/debian/vmlinuz
> 67a9622f681abd91cc4710da8894b743  /mnt/efi/debian/vmlinuz.old
> ajohnso2@spielplatz:~$ 
> 
> > If my theory is correct, you may be able to get back up and running
> > by booting an older kernel (if you have one), running 'elilo', then
> > booting back into the 2.6.26-11 kernel.
> 
> OK, so that worked.  What change did re-running elilo make?  Based on
> the MD5sums, there are new initrd and vmlinuz files.  Seems like
> installing kernel-image-2.6.26-1-mckinley should have done that in its
> postinst script.

Here's what I think happened:
 - Running 2.6.26-8
 - Upgraded to 2.6.26-11
   - unpacked 2.6.26-11
   - generated initramfs
   - called elilo
     - elilo loads modules it needs to mount EFI partition,
       but the modules available are now for 2.6.26-11 and
       are incompatible with 2.6.26-8.
     - system tries to mount efi partition and hangs due to
       incompatible modules - kernel/initrd in the efi partition
       is now out of date with respect to the files in /boot
  - system boots 2.6.26-8 again
    - initramfs loads, works fine (still using 2.6.26-8 initramfs)
    - system mounts root
    - system starts loading modules from the root partition (which
      are now 2.6.26-11 modules), and does bad things.

The bug would therefore be that we created a kernel with the same
abiname that was actually incompatible with the modules from an
earlier release.

> What happens to the poor user who doesn't know to re-run elilo?  (Not
> that I expect there are too many "poor users" running ia64 systems.)

Unfortunately, I don't know that there's anyway to retroactively solve
this problem. The cat is out of the bag, as they say.

It would be a nice safety procedure to make sure the modules we need
are loaded before we unpack the new modules - i.e., in the
preinst. One way to do this would be to call 'elilo' in the preinst.

Savy users can configure their systems to do this themselves by adding
a preinst hook in /etc/kernel-img.conf.
 
-- 
dann frazier




Reply to: