[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#399812: linux-image-2.6.17-2-686: Wrong nic ordering with d-i-RC1 kernel on new Dell poweredge



Package: linux-image-2.6.17-2-686
Version: 2.6.17-9
Severity: normal
Tags: patch


Hello,

I did a bunch of d-i-RC1 installation on a Dell poweredge 2950 last
night (using both i386 and amd64 netinstall iso) and the kernel find the
embeded nic in the wrong order.

As I've seen someone using a 2.8.18.2 kernel complaining for the same pb 
in some french list, I did some digging around. Here is my finding :

http://linux.dell.com/files/whitepapers/nic-enum-whitepaper-v2.pdf dated
octobre 2006, explains that current 2.6 kernels get the ordering of
emmbeded nics wrong on the whole 9th generation (current saling
servers) of Dell poweredges.

Some custom sarge-install grabed on the net (don't have the url right
now but it has been floating arround on the linux-poweredge list) didn't 
have this pb. It was using linux-2.6.19-RC3.

Grepping through
http://www.kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.19-rc3
show up the following :

commit 6b4b78fed47e7380dfe9280b154e8b9bfcd4c86c
Author: Matt Domsch <Matt_Domsch@dell.com>
Date:   Fri Sep 29 15:23:23 2006 -0500

    PCI: optionally sort device lists breadth-first

    ...
    Feedback appreciated.  Patch tested on a Dell PowerEdge 1955
   blade with 2.6.18.

Which is the fix. 

I would suggest to include this in the next d-i target kernel, 'cause it
will save a lot of time for a _lot_ of people.

@+,
	Fab


Full changelog of the commit :

commit 6b4b78fed47e7380dfe9280b154e8b9bfcd4c86c
Author: Matt Domsch <Matt_Domsch@dell.com>
Date:   Fri Sep 29 15:23:23 2006 -0500

    PCI: optionally sort device lists breadth-first
    
    Problem:
    New Dell PowerEdge servers have 2 embedded ethernet ports, which are
    labeled NIC1 and NIC2 on the chassis, in the BIOS setup screens, and
    in the printed documentation.  Assuming no other add-in ethernet
ports
    in the system, Linux 2.4 kernels name these eth0 and eth1
    respectively.  Many people have come to expect this naming.  Linux
2.6
    kernels name these eth1 and eth0 respectively (backwards from
    expectations).  I also have reports that various Sun and HP servers
    have similar behavior.
    
    
    Root cause:
    Linux 2.4 kernels walk the pci_devices list, which happens to be
    sorted in breadth-first order (or pcbios_find_device order on i386,
    which most often is breadth-first also).  2.6 kernels have both the
    pci_devices list and the pci_bus_type.klist_devices list, the latter
    is what is walked at driver load time to match the pci_id tables;
this
    klist happens to be in depth-first order.
    
    On systems where, for physical routing reasons, NIC1 appears on a
    lower bus number than NIC2, but NIC2's bridge is discovered first in
    the depth-first ordering, NIC2 will be discovered before NIC1.  If
the
    list were sorted breadth-first, NIC1 would be discovered before
NIC2.
    
    A PowerEdge 1955 system has the following topology which easily
    exhibits the difference between depth-first and breadth-first device
    lists.
    
    -[0000:00]-+-00.0  Intel Corporation 5000P Chipset Memory Controller
Hub
               +-02.0-[0000:03-08]--+-00.0-[0000:04-07]--+-00.0-[0000:05-06]----00.0-[0000:06]----00.0
Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled
NIC2, 2.4 kernel name eth1,
 2.6 kernel name eth0)
               +-1c.0-[0000:01-02]----00.0-[0000:02]----00.0  Broadcom
Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC1, 2.4
kernel name eth0, 2.6 kernel name eth1)
    
    
    Other factors, such as device driver load order and the presence of
    PCI slots at various points in the bus hierarchy further complicate
    this problem; I'm not trying to solve those here, just restore the
    device order, and thus basic behavior, that 2.4 kernels had.
    
    
    Solution:
    
    The solution can come in multiple steps.
    
    Suggested fix #1: kernel
    Patch below optionally sorts the two device lists into breadth-first
    ordering to maintain compatibility with 2.4 kernels.  It adds two
new
    command line options:
      pci=bfsort
      pci=nobfsort
    to force the sort order, or not, as you wish.  It also adds DMI
checks
    for the specific Dell systems which exhibit "backwards" ordering, to
    make them "right".
    
    
    Suggested fix #2: udev rules from userland
    Many people also have the expectation that embedded NICs are always
    discovered before add-in NICs (which this patch does not try to do).
    Using the PCI IRQ Routing Table provided by system BIOS, it's easy
to
    determine which PCI devices are embedded, or if add-in, which PCI
slot
    they're in.  I'm working on a tool that would allow udev to name
    ethernet devices in ascending embedded, slot 1 .. slot N order,
    subsort by PCI bus/dev/fn breadth-first.  It'll be possible to use
it
    independent of udev as well for those distributions that don't use
    udev in their installers.
    
    Suggested fix #3: system board routing rules
    One can constrain the system board layout to put NIC1 ahead of NIC2
    regardless of breadth-first or depth-first discovery order.  This
adds
    a significant level of complexity to board routing, and may not be
    possible in all instances (witness the above systems from several
    major manufacturers).  I don't want to encourage this particular
train
    of thought too far, at the expense of not doing #1 or #2 above.
    
    
    Feedback appreciated.  Patch tested on a Dell PowerEdge 1955 blade
    with 2.6.18.
    
    You'll also note I took some liberty and temporarily break the klist
    abstraction to simplify and speed up the sort algorithm.  I think
    that's both safe and appropriate in this instance.
    
    
    Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


-- System Information:
Debian Release: 4.0
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-2-xen-k7
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)



Reply to: