[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#795060: Latest Wheezy backport kernel prefers Infiniband mlx4_en over mlx4_ib, breaks existing installs



Package: linux-image-3.16.0-0.bpo.4-amd64
Version: 3.16.7-ckt11-1+deb8u2~bpo70+1
Severity: Critical


Hello,

We have a 2 node Supermicro chassis (2028TP-DC0FR) chassis with an onboard
Mellanox ConnecX-3 HBA in production since last year. 
Both nodes are directly connected with a QFSP FDR cable.
We use IPoIB (for DRBD) and thus load the mlx4_ib module and all the
assorted other ones in /etc/modules at boot time. 
These are Wheezy machines, currently with the 3.16.7-ckt2-1~bpo70+1 kernel.

Last week we got another (identical) one of these chassis and I installed
Wheezy as well (we need pacemaker, which is sorely lacking in Jessie).
This was with the 3.16.7-ckt11-1+deb8u2~bpo70+1 kernel and unlike in the
past it proceeded to load the mlx4_en module automatically, created an
eth2: interface and the ib0: interface was nowhere to be found.

This was not only very unexpected, I was also under the impression that 
mlx4_en and mlx4_ib could be used in parallel, but even though mlx4_ib was
loaded it did not work (the  /sys/class/net/ib0 entry was not created).

Booting into the stock Wheezy 3.2 kernel (which we also run on older
machines with ConnectX-2 HBAs) resulted in the expected behavior, IB
interface, no Ethernet. 

I'm also not seeing this on several other machines we use for Ceph with the
current Jessie kernel, but to be fair they use slightly different (QDR,
not FDR) ConnectX-3 HBAs.

After doing a fake-install (blacklisting didn't work) like this:
---
echo "install mlx4_en /bin/true" > /etc/modprobe.d/mlx4_en.conf 
depmod -a
update-initramfs -u
---
and rebooting I have IB running on 3.16.0-0.bpo.4-amd64 again as well.

Given that the previous version works as expected and that Jessie is
doing the "right" thing as well, I'd consider this a critical bug.

Had I rebooted the older production cluster with 500,000 users on it into
this kernel, the results would not have been pretty.

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/


Reply to: