Bug#439462: linux-2.6: pci ordering issue
Package: linux-2.6
Version: 2.6.18.dfsg.1-13etch1
Last September Matt Domsch <Matt_Domsch@dell.com> reported a problem where,
due to the difference in the way the 2.4 and 2.6 kernels walk the PCI bus,
on some systems drivers (mainly NIC drivers) were discovering and naming
devices in different orders from 2.4 to 2.6. The problem, potential
solutions, and proposed patch are at
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdif
f;h=6b4b78fed47e7380dfe9280b154e8b9bfcd4c86c
The patch changes the kernel pci sorting order to breadth-first for systems
that are known to have their chassis ports(and documentation/remote
management) labeled in that order. It does this by matching DMI strings for
the systems. Matt Domsch later provided another two patches adding
additional systems to the list
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdif
f;h=f52383d395178afde66d049e176bb2c59a8c941a;hp=691cd0c2ee2d4d6dff652627fca1
b2d4f1377d58
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdif
f;h=f7a9dae7c41580761e7f6de1d508c010b1b44993
Here's a minor related patch
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=
2b290da053608692ea206507d993b70c39d2cdea
These patches were not in the 2.6.18 kernel that shipped with etch, but
they are in the newer 2.6 kernels in lenny and sid. For the 17 different
system types covered by these patches, people installing etch (or older 2.6
kernels) will have their NICs potentially discovered in a different order
than 2.4 kernels in sarge and 2.6 kernels in lenny and sid. There are a few
different cases:
1) People fresh installing etch on these systems will be confused that the
linux ordering doesn't match the vendor chassis/documentation/remote
management labels.
2) People upgrading from 2.4 kernels (like in sarge) to etch will be
confused when their NICs are reordered. If possible something should be
added to the etch errata about this.
3) People upgrading from etch kernels to newer 2.6 kernels in lenny/sid
will be confused when their NICs swap back and now suddenly match the
chassis/docs/remote management labels. Something will need to be added to
the lenny release notes about this.
4) People upgrading from sarge 2.4 kernels directly to lenny/sid 2.6
kernels won't have a problem. But I'm not sure if that's a supported
upgrade path, I think the recommendation is upgrading via etch.
On frustrating thing is that the more people "X" that install broken etch
on these systems, the more that A) have to deal with the confusion of
having things bacwards and B) will have things changed the other direction
when they upgrade to lenny. It is tempting to think about trying to include
these patches in a stable kernel update to try and minimize X, but for the
people that have already installed broken etch on these system "Y", they
would be changed with a stable kernel update which is probably even more
shocking. Because the systems affected are fairly new, I am guessing that X
>> Y, but I'm not sure if that's enough to justify a stable kernel update.
I guess the stable kernel release managers can decide that.
I have access to several of the systems on the list and ran into this bug
when installing etch on them, the results are sort of interesting:
Proliant bl460c: two internal nics, swapped
NIC1=eth1 NIC2=eth0
Proliant bl465c: two internal nics, not swapped (routing was such that
depth-first and breadth-first produced the same result)
NIC1=eth0 NIC2=eth1
Proliant bl480c: four internal nics, pairs swapped
NIC1=eth2 NIC2=eth3 NIC3=eth0 NIC4=eth1
I put "lspci -tvnn" output for the above at
http://people.debian.org/~taggart/tmp/pci-ordering/
I have booted newer lenny/sid kernels on the above machines and confirmed
that the patches fix the ordering. I'm willing to test other potential
fixes if needed.
I am filing this bug against a specific version of linux-2.6, but it
affects all older 2.6 kernels and all newer 2.6 kernels up to the point the
above patches made it into a debian kernel (the first patch was in upstream
2.6.19 at least).
Thanks,
--
Matt Taggart
taggart@debian.org
Reply to: