[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#439462: linux-2.6: pci ordering issue



On Sat, Aug 25, 2007 at 01:22:37AM -0700, Matt Taggart wrote:
> Package: linux-2.6
> Version: 2.6.18.dfsg.1-13etch1
> 
> Last September Matt Domsch <Matt_Domsch@dell.com> reported a problem where, 
> due to the difference in the way the 2.4 and 2.6 kernels walk the PCI bus, 
> on some systems drivers (mainly NIC drivers) were discovering and naming 
> devices in different orders from 2.4 to 2.6. The problem, potential 
> solutions, and proposed patch are at
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdif
> f;h=6b4b78fed47e7380dfe9280b154e8b9bfcd4c86c
> 
> The patch changes the kernel pci sorting order to breadth-first for systems 
> that are known to have their chassis ports(and documentation/remote 
> management) labeled in that order. It does this by matching DMI strings for 
> the systems. Matt Domsch later provided another two patches adding 
> additional systems to the list
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdif
> f;h=f52383d395178afde66d049e176bb2c59a8c941a;hp=691cd0c2ee2d4d6dff652627fca1
> b2d4f1377d58
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdif
> f;h=f7a9dae7c41580761e7f6de1d508c010b1b44993
> 
> Here's a minor related patch
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=
> 2b290da053608692ea206507d993b70c39d2cdea
> 
> These patches were not in the 2.6.18 kernel that shipped with etch, but 
> they are in the newer 2.6 kernels in lenny and sid. For the 17 different 
> system types covered by these patches, people installing etch (or older 2.6 
> kernels) will have their NICs potentially discovered in a different order 
> than 2.4 kernels in sarge and 2.6 kernels in lenny and sid. There are a few 
> different cases:
> 
> 1) People fresh installing etch on these systems will be confused that the 
> linux ordering doesn't match the vendor chassis/documentation/remote 
> management labels.
> 
> 2) People upgrading from 2.4 kernels (like in sarge) to etch will be 
> confused when their NICs are reordered. If possible something should be 
> added to the etch errata about this.
> 
> 3) People upgrading from etch kernels to newer 2.6 kernels in lenny/sid 
> will be confused when their NICs swap back and now suddenly match the 
> chassis/docs/remote management labels. Something will need to be added to 
> the lenny release notes about this.
> 
> 4) People upgrading from sarge 2.4 kernels directly to lenny/sid 2.6 
> kernels won't have a problem. But I'm not sure if that's a supported 
> upgrade path, I think the recommendation is upgrading via etch.
> 
> On frustrating thing is that the more people "X" that install broken etch 
> on these systems, the more that A) have to deal with the confusion of 
> having things bacwards and B) will have things changed the other direction 
> when they upgrade to lenny. It is tempting to think about trying to include 
> these patches in a stable kernel update to try and minimize X, but for the 
> people that have already installed broken etch on these system "Y", they 
> would be changed with a stable kernel update which is probably even more 
> shocking. Because the systems affected are fairly new, I am guessing that X 
> >> Y, but I'm not sure if that's enough to justify a stable kernel update. 
> I guess the stable kernel release managers can decide that.
> 
> I have access to several of the systems on the list and ran into this bug 
> when installing etch on them, the results are sort of interesting:
> 
> Proliant bl460c: two internal nics, swapped
> NIC1=eth1 NIC2=eth0
> 
> Proliant bl465c: two internal nics, not swapped (routing was such that 
> depth-first and breadth-first produced the same result)
> NIC1=eth0 NIC2=eth1
> 
> Proliant bl480c: four internal nics, pairs swapped
> NIC1=eth2 NIC2=eth3 NIC3=eth0 NIC4=eth1
> 
> I put "lspci -tvnn" output for the above at
>   http://people.debian.org/~taggart/tmp/pci-ordering/
> 
> I have booted newer lenny/sid kernels on the above machines and confirmed 
> that the patches fix the ordering. I'm willing to test other potential 
> fixes if needed.
> 
> I am filing this bug against a specific version of linux-2.6, but it 
> affects all older 2.6 kernels and all newer 2.6 kernels up to the point the 
> above patches made it into a debian kernel (the first patch was in upstream 
> 2.6.19 at least).

Matt,
this patch was never backported to the 2.6.18 kernel.
Since most systems should be upgraded to Etch by now I think we should just
close this bug. Do you agree?

(Also all NICs are covered the persistent-net-rules udev rule)

Cheers,
        Moritz





Reply to: