[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

IPoIB interface won't come up, but everything else seems ok



Hi all,

We have a MT27500 (Mellanox) Family [ConnectX-3] adapter installed in our computer.  I followed the wiki https://wiki.debian.org/RDMA.  It seems to be working fine at least through RDMA.  You see below required modules are loaded, port is active, info about other clients on the network can be retrieved, and `ibping` is working.

We also use the IPoIB interface as well, though, which is up and running on the other clients.  However, for some reason, can't bring up that interface on this computer.  (See below after `ibping` command).  After running the `ip addr add` command, it assigns the address to the interface, but it remains DOWN.  `ifup ibs3` says unknown interface?

Also don't understand why `ib0` got renamed to `ibs3` and `ib1` to `ibs3d1`.  Further, lshw is reporting this network as DISABLED?

Oh my gosh you all... ok, after writing this whole report, I decided to try to delete the address assigned by `ip` command and define the interface in /etc/network/interfaces.  Now `ifup ibs3` doesn't complain about unknown interface and it's also reported as UP now, with ping working to another computer.

So, I guess the statements on the wiki after running the `ip addr add` command are incorrect? ("The IP address should now respond to pings.  If there are other hosts configured with IPoIB, each interface's addresses should also be pingable.")

Maybe there is something missing from the `ib addr add` command?
Best,
Chandler


# lsmod | grep '\(^ib\|^rdma\)'
ib_umad                36864  0
ib_ipoib              147456  0
rdma_ucm               32768  0
rdma_cm               131072  2 rpcrdma,rdma_ucm
ib_cm                 135168  2 rdma_cm,ib_ipoib
ib_uverbs             167936  2 mlx4_ib,rdma_ucm
ib_core               413696  9 rdma_cm,ib_ipoib,rpcrdma,mlx4_ib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,ib_cm

# ibstat
CA 'mlx4_0'
 CA type: MT4099, Number of ports: 2, Firmware version: 2.40.7000,Hardware version: 1,Node GUID: 0xe41d2d03006f8510,System image GUID: 0xe41d2d03006f8513
 Port 1: *State: Active*, *Physical state: LinkUp*,Rate: 40 (FDR10),Base lid: 4,LMC: 0,SM lid: 13,Capability mask: 0x02514868,Port GUID: 0xe41d2d03006f8511,Link layer: InfiniBand
 Port 2:State: Down,Physical state: Polling,Rate: 10,Base lid: 0,LMC: 0,SM lid: 0,Capability mask: 0x02514868,Port GUID: 0xe41d2d03006f8512,Link layer: InfiniBand

# iblinkinfo
[prints info about other clients]
Switch: 0x0002c903008995b0 SwitchX -  Mellanox Technologies:
[prints all the clients connected to each port on the switch]
           8   35[  ] ==( 4X  10.0 Gbps (FDR10) Active/  LinkUp)==>       4    1[  ] "Xba mlx4_0" ( )
[...]
CA: Xba mlx4_0:
      0xe41d2d03006f8511      4    1[  ] ==( 4X  10.0 Gbps (FDR10) Active/  LinkUp)==>       8   35[  ] "SwitchX -  Mellanox Technologies" ( )
#

On another computer on the network:
# ibhosts
Ca	: 0xe41d2d03006f8510 ports 2 "Xba mlx4_0"
[...]

# ibping -G 0xe41d2d03006f8511
Pong from Xba.(none) (Lid 4): time 0.107 ms
Pong from Xba.(none) (Lid 4): time 0.125 ms
Pong from Xba.(none) (Lid 4): time 0.117 ms
Pong from Xba.(none) (Lid 4): time 0.104 ms
Pong from Xba.(none) (Lid 4): time 0.109 ms
Pong from Xba.(none) (Lid 4): time 0.108 ms
^C
--- Xba.(none) (Lid 4) ibping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5098 ms
rtt min/avg/max = 0.104/0.111/0.125 ms

Back on this computer:
# ip addr
[...]
4: ibs3: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default qlen 256
    link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:6f:85:11 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    altname ibp1s0
5: ibs3d1: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default qlen 256
    link/infiniband 80:00:02:09:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:6f:85:12 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    altname ibp1s0d1

# ip link
[...]
4: ibs3: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
    link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:6f:85:11 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    altname ibp1s0
5: ibs3d1: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
    link/infiniband 80:00:02:09:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:6f:85:12 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    altname ibp1s0d1

# ip addr add 10.10.11.203/24 dev ibs3
# ip addr
[...]
4: ibs3: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default qlen 256
    link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:6f:85:11 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    altname ibp1s0
    inet 10.10.11.203/24 scope global ibs3
       valid_lft forever preferred_lft forever
5: ibs3d1: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default qlen 256
    link/infiniband 80:00:02:09:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:6f:85:12 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    altname ibp1s0d1

# ifup ibs3
ifup: unknown interface ibs3

# dmesg -T|grep mlx4
[Sat Jan  7 22:12:31 2023] mlx4_core: Mellanox ConnectX core driver v4.0-0
[Sat Jan  7 22:12:31 2023] mlx4_core: Initializing 0000:01:00.0
[Sat Jan  7 22:12:38 2023] mlx4_core 0000:01:00.0: DMFS high rate steer mode is: disabled performance optimized steering
[Sat Jan  7 22:12:38 2023] mlx4_core 0000:01:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[Sat Jan  7 22:12:38 2023] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v4.0-0
[Sat Jan  7 22:12:38 2023] <mlx4_ib> mlx4_ib_add: counter index 0 for port 1 allocated 0
[Sat Jan  7 22:12:38 2023] <mlx4_ib> mlx4_ib_add: counter index 1 for port 2 allocated 0
[Thu Jan 12 03:42:35 2023] mlx4_core 0000:01:00.0 ibs3: renamed from ib0
[Thu Jan 12 03:42:35 2023] mlx4_core 0000:01:00.0 ibs3d1: renamed from ib1

# lshw -class network
  *-network DISABLED
       description: interface
       product: MT27500 Family [ConnectX-3]
       vendor: Mellanox Technologies
       physical id: 0
       bus info: pci@0000:01:00.0
       logical name: ibs3
       version: 00
       serial: 80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:00:00:00:00:00:00
       width: 64 bits
       clock: 33MHz
       capabilities: pm vpd msix pciexpress bus_master cap_list rom physical
       configuration: autonegotiation=on broadcast=yes driver=ib_ipoib driverversion=5.10.0-20-amd64 duplex=full firmware=2.40.7000 ip=10.10.11.203 latency=0 link=no multicast=yes
       resources: irq:24 memory:fb100000-fb1fffff memory:fa800000-faffffff memory:fb000000-fb0fffff
[...]

# ip addr del 10.10.11.203/24 dev ibs3

# ping 10.10.11.100
PING 10.10.11.100 (10.10.11.100) 56(84) bytes of data.
64 bytes from 10.10.11.100: icmp_seq=1 ttl=64 time=1.50 ms
64 bytes from 10.10.11.100: icmp_seq=2 ttl=64 time=0.184 ms
64 bytes from 10.10.11.100: icmp_seq=3 ttl=64 time=0.178 ms
^C
--- 10.10.11.100 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2006ms
rtt min/avg/max/mdev = 0.178/0.619/1.495/0.619 ms

# lshw -class network
  *-network
       description: interface
       product: MT27500 Family [ConnectX-3]
[...]
#


Reply to: