[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#493479: linux-image-2.6.26-1-amd64: Update on status of this bug



Package: linux-image-2.6.26-1-amd64
Version: 2.6.26-*
Followup-For: Bug #493479


After originally filing this bug report here on the Debian BTS, I
performed a kernel bisection and took my findings to the LKML.  About
3 weeks later, the problem had finally been correctly diagnosed:
changes between 2.6.25 and 2.6.26 had reordered the sequence of
function calls responsible for detecting PCI devices, and in my case
an overlap of the memory resource region for the HPET device was
causing kernel hangs on my hardware.  This had not occured with
2.6.25, but the changes leading up to 2.6.26 (which apparently fixed a
bug on other hardware) created the problem on my hardware.

My motherboard has an AMD SB600 southbridge, and some of the Linux
kernel team mistakenly believed that the problem is limited to that
chipset.  I was able to discover (in a Google search) at least one
other person with completely different hardware -- Intel CPU +
motherboard combination -- who began experiencing hangs in the 2.6.26
kernel series at exactly the same SHA1 hash that I discovered when
carrying out my original bisection, so the changes between 2.6.25 and
2.6.26 affect more hardware combinations than my own.

After a lengthy and frustrating period of time trying to discover the
root cause of the hangs, patches were created that allowed me to build
a kernel that would not hang.  These patches were primarily made by
Yinghai Lu and Ingo Molnar, and were against Ingo Molnar's tip/master
git tree.  Unfortunately, the patches were created too late in the
development of 2.6.27:  when Linus Torvalds saw the PCI code being
touched, and the potential problems that could create, he decided to
reject those changes for the 2.6.27 kernels and postpone them until
the 2.6.28 series.

I had hoped to provide a patch for the 2.6.26 kernels which Debian
will be using for its Lenny release, but my attention moved away from
these kernel issues until release candidates for 2.6.28 became
available.  Only today did I return to this issue again to determine
the status of the situation.  Here is what I found:


    1) A patch had been provided on Sep. 12 by Jordan Crouse (a kernel
    developer employed by AMD, IIRC) which should have allowed any
    2.6.26 or 2.6.27 kernel to boot on my hardware:

    http://www.uwsg.indiana.edu/hypermail/linux/kernel/0809.1/1902.html

    This patch is supposed to prevent the memory resource region of
    the HPET device on SB600 southbridge motherboards from overlapping
    with the resource regions of other PCI devices.  I found that this
    patch fails to make any difference on my hardware (with said
    southbridge) for any 2.6.26 or 2.6.27 kernel.


    2) Since the release candidates for 2.6.28 are now up to "rc3", I
    decided to begin with "rc1".  I found that kernel 2.6.28-rc1 (from
    Torvalds' git tree) would hang during boot when initializing the
    HPET device, which I took as a bad sign!  Booting the kernel with
    the "nohpet" parameter allowed to kernel to boot all the way to a
    login prompt, only to hang at that point.  (Unlike the hangs
    experienced with 2.6.2[67] kernels, I was able to use the Magic
    SysRq keys to sync, unmount, and reboot my filesystem in a nice
    way, however.)


    3) Fully expecting fallout from the mad rush of changes that go
    into that first 2-week window of changes, I checked out the
    2.6.28-rc3 kernel from the Torvalds tree.  Happily, this kernel
    boots fine... and without the need for any special parameters.
    The HPET device is working, and no hang occurs at the login
    prompt.  (I am submitting this BTS update using the very same
    kernel, as can be seen below under "System Information"!)

I can now state that the 2.6.28 kernel series has resolved the
problems with hangs on my hardware.

It may now be possible to provide patches for 2.6.26 which will allow
it to boot without hanging on machines experiencing such problems.
Does the Debian Kernel Team have any interest in seeing such patches?
If so, I could start working on backporting the minimum set of changes
from 2.6.28 to 2.6.26 which cure the problem on my hardware.


-- System Information:
Debian Release: lenny/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.28-rc3.081104.fileserver.uvesafb (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-image-2.6.26-1-amd64 depends on:
ii  debconf [debconf-2.0]         1.5.22     Debian configuration management sy
ii  initramfs-tools [linux-initra 0.92j      tools for generating an initramfs
ii  module-init-tools             3.4-1      tools for managing Linux kernel mo

linux-image-2.6.26-1-amd64 recommends no packages.

Versions of packages linux-image-2.6.26-1-amd64 suggests:
ii  grub                          0.97-47    GRand Unified Bootloader (Legacy v
pn  linux-doc-2.6.26              <none>     (no description available)



Reply to: