[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#439462: linux-2.6: pci ordering issue



Package: linux-2.6
Version: 2.6.18.dfsg.1-13etch1

Last September Matt Domsch <Matt_Domsch@dell.com> reported a problem where, 
due to the difference in the way the 2.4 and 2.6 kernels walk the PCI bus, 
on some systems drivers (mainly NIC drivers) were discovering and naming 
devices in different orders from 2.4 to 2.6. The problem, potential 
solutions, and proposed patch are at

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdif
f;h=6b4b78fed47e7380dfe9280b154e8b9bfcd4c86c

The patch changes the kernel pci sorting order to breadth-first for systems 
that are known to have their chassis ports(and documentation/remote 
management) labeled in that order. It does this by matching DMI strings for 
the systems. Matt Domsch later provided another two patches adding 
additional systems to the list

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdif
f;h=f52383d395178afde66d049e176bb2c59a8c941a;hp=691cd0c2ee2d4d6dff652627fca1
b2d4f1377d58
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdif
f;h=f7a9dae7c41580761e7f6de1d508c010b1b44993

Here's a minor related patch
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=
2b290da053608692ea206507d993b70c39d2cdea

These patches were not in the 2.6.18 kernel that shipped with etch, but 
they are in the newer 2.6 kernels in lenny and sid. For the 17 different 
system types covered by these patches, people installing etch (or older 2.6 
kernels) will have their NICs potentially discovered in a different order 
than 2.4 kernels in sarge and 2.6 kernels in lenny and sid. There are a few 
different cases:

1) People fresh installing etch on these systems will be confused that the 
linux ordering doesn't match the vendor chassis/documentation/remote 
management labels.

2) People upgrading from 2.4 kernels (like in sarge) to etch will be 
confused when their NICs are reordered. If possible something should be 
added to the etch errata about this.

3) People upgrading from etch kernels to newer 2.6 kernels in lenny/sid 
will be confused when their NICs swap back and now suddenly match the 
chassis/docs/remote management labels. Something will need to be added to 
the lenny release notes about this.

4) People upgrading from sarge 2.4 kernels directly to lenny/sid 2.6 
kernels won't have a problem. But I'm not sure if that's a supported 
upgrade path, I think the recommendation is upgrading via etch.

On frustrating thing is that the more people "X" that install broken etch 
on these systems, the more that A) have to deal with the confusion of 
having things bacwards and B) will have things changed the other direction 
when they upgrade to lenny. It is tempting to think about trying to include 
these patches in a stable kernel update to try and minimize X, but for the 
people that have already installed broken etch on these system "Y", they 
would be changed with a stable kernel update which is probably even more 
shocking. Because the systems affected are fairly new, I am guessing that X 
>> Y, but I'm not sure if that's enough to justify a stable kernel update. 
I guess the stable kernel release managers can decide that.

I have access to several of the systems on the list and ran into this bug 
when installing etch on them, the results are sort of interesting:

Proliant bl460c: two internal nics, swapped
NIC1=eth1 NIC2=eth0

Proliant bl465c: two internal nics, not swapped (routing was such that 
depth-first and breadth-first produced the same result)
NIC1=eth0 NIC2=eth1

Proliant bl480c: four internal nics, pairs swapped
NIC1=eth2 NIC2=eth3 NIC3=eth0 NIC4=eth1

I put "lspci -tvnn" output for the above at
  http://people.debian.org/~taggart/tmp/pci-ordering/

I have booted newer lenny/sid kernels on the above machines and confirmed 
that the patches fix the ordering. I'm willing to test other potential 
fixes if needed.

I am filing this bug against a specific version of linux-2.6, but it 
affects all older 2.6 kernels and all newer 2.6 kernels up to the point the 
above patches made it into a debian kernel (the first patch was in upstream 
2.6.19 at least).

Thanks,

-- 
Matt Taggart
taggart@debian.org





Reply to: