[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Can't get NIC Bonding with active-backup working



Hi all,

Have tried to get NIC Bonding working as per wiki.debian.org/Bonding.

Each NIC is connected to a different switch for redundancy rather than
bandwidth purposes (insulate against a switch failure).  I'm using the
active-backup mode for HA failover.

output from cat /etc/network/interfaces

auto bond0
iface bond0 inet static
    address 192.168.166.164
    netmask 255.255.255.240
    network 192.168.166.160
    gateway 192.168.166.161
    slaves eth0 eth1
    bond_mode active-backup
    bond_miimon 100
    bond_downdelay 200
    bond_updelay 200

output from cat /proc/net/bonding/bond0 as follows:

Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: b8:ab:6f:92:eb:c3

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: b8:ab:6f:92:eb:c4

If I then pull a cable (or use ifconfig eth0 down) I get the following
in the syslog:

Jan 23 11:21:50 host-1 kernel: [55852.565975] bonding: bond0: link
status down for active interface eth0, disabling it in 200 ms.
Jan 23 11:21:51 host-1 kernel: [55852.761549] bonding: bond0: link
status definitely down for interface eth0, disabling it
Jan 23 11:21:51 host-1 kernel: [55852.761555] bonding: bond0: making
interface eth1 the new active one.

All looks good... but... ping from host-1 produces Destination host
unreachable (with the icmp errors coming from the IP of the bond0 device
itself).  And my remote ssh session dies.  Good job I have KVM access :)

So it's not working.  This setup seems so simple I can't see where
anything could be wrong, so I'm starting to suspect a problem with the
switch. Maybe the switch(es) are being too clever? But then again maybe
I've done something wrong.

What can I do to find out what's going on?  I'm using Squeeze (current
point release) and Kernel  2.6.32-5-amd64.


Reply to: