[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1004139: ifenslave: Hotplugged interfaces sometimes not added to bond-master due to ifquery race condition



Package: ifenslave
Version: 2.13
Severity: normal

Hi,

I have a simple bonding configuration with two physical Ethernet
interfaces, both defined with the `allow-hotplug` option.  I use
allow-hotplug because the interfaces are on USB.  And since I'm
using allow-hotplug, I chose the style of configuration where I use
the `bond-master` configuration option under each slave configuration
stanza, rather than `bond-slaves` in the bonding master interface
stanza.

This configuration is similar to the one in
examples/two_hotplug_ethernet.  I'm attaching a copy of my
/etc/network/interfaces file.

The problem is that sometimes after a reboot, I find that one of my
interfaces (wlx0013efd01275: an external, wireless USB interface) is
not a member of the bond:

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enxb827eb9e4634: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP group default qlen 1000
    link/ether b8:27:eb:9e:46:34 brd ff:ff:ff:ff:ff:ff
3: wlx0013efd01275: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:13:ef:d0:12:75 brd ff:ff:ff:ff:ff:ff
    inet6 2002:ce3f:e590:2:213:efff:fed0:1275/64 scope global dynamic mngtmpaddr 
       valid_lft 86179sec preferred_lft 14179sec
    inet6 2002:ce3f:e590:1:213:efff:fed0:1275/64 scope global dynamic mngtmpaddr 
       valid_lft 86179sec preferred_lft 14179sec
    inet6 fe80::213:efff:fed0:1275/64 scope link 
       valid_lft forever preferred_lft forever
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether b8:27:eb:9e:46:34 brd ff:ff:ff:ff:ff:ff
    inet 192.168.66.19/21 brd 192.168.71.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet6 2002:ce3f:e590:1:1::19/64 scope global nodad 
       valid_lft forever preferred_lft forever
    inet6 fe80::ba27:ebff:fe9e:4634/64 scope link 
       valid_lft forever preferred_lft forever
$ cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v5.10.0-10-rpi

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: enxb827eb9e4634 (primary_reselect always)
Currently Active Slave: enxb827eb9e4634
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

Slave Interface: enxb827eb9e4634
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b8:27:eb:9e:46:34
Slave queue ID: 0
$ /sbin/ifquery --state
lo=lo
bond0=bond0
wlx0013efd01275=wlx0013efd01275
enxb827eb9e4634=enxb827eb9e4634

In this situation, this is logged to syslog (via systemd):

sh[299]: Failed to enslave wlx0013efd01275 to bond0.

I think I understand the cause, and propose a workaround or a bug fix
(depending on how you look at it).

When both devices are present at boot, two ifup@<iface>.service units
(one for each interface) are started simultaneously.  It seems like
the ifenslave ifupdown scripts are meant to handle this, but in
/etc/network/if-pre-up.d/ifenslave, in the definition of setup_slave_device(),
there is:

    # Ensure the master is up or being configured
    export IFENSLAVE_ENV_NAME="IFUPDOWN_$IF_BOND_MASTER"
    IFUPDOWN_IF_BOND_MASTER="$(printenv "$IFENSLAVE_ENV_NAME")"
    unset IFENSLAVE_ENV_NAME
    if [ -z "$IFUPDOWN_IF_BOND_MASTER" ] ; then
            ifquery --state "$IF_BOND_MASTER" >/dev/null 2>&1 || ifup "$IF_BOND_MASTER"
    fi

I've added loads of debugging and done many reboot cycles to find out
that when the problem occurs (or at least in one case), both
simultaneously-running processes get to the `ifquery` line.  One of
the processes executes ifquery and gets a non-zero return code,
leading it to run `ifup "$IF_BOND_MASTER"`.  After ifup starts, the
other script process executes ifquery and gets a zero return code.

In this case, the bond interface hasn't come up yet, but the command
to bring it up has started, which I think is why ifquery is returning
zero here.  I can reproduce this behavior of ifquery with a dummy
interface:

iface dummy0 inet manual
        pre-up modprobe dummy
        pre-up sleep 5
        up ip link add dummy0 type dummy
        down ip link del dummy0

$ q() { sudo ifquery --state dummy0; echo "  => $?"; }
$ q
  => 1
$ sudo ifup dummy0 & sleep 1; q; ip link show dummy0; ps $!
[1] 7956
dummy0=dummy0
  => 0
Device "dummy0" does not exist.
  PID TTY      STAT   TIME COMMAND
 7956 pts/1    S      0:00 sudo ifup dummy0

This shows that ifquery does return 0, even while ifup is still
working.

>From my reading of ifquery(8), this is possibly a bug ("successful
code is returned if all of interfaces given as arguments are up").
Either ifquery is supposed to return failure in this case (meaning
ifquery has a bug, since the interface hasn't finished coming up yet),
or it is valid for it to return successfully since an ifup process has
been _started_ for this interface (in which case the ifenslave script
has a bug--but ifquery(8) could probably be improved).  Regardless of
whether ifquery has a bug here, the ifenslave script could be
simplified in a way that works around the issue anyway:

    # [omitted context]
    if [ -z "$IFUPDOWN_IF_BOND_MASTER" ] ; then
            ifup "$IF_BOND_MASTER"
    fi

Now, if ifup gets started concurrently for the same bond master
interface, the second process blocks (due to ifupdown locking) until
the first process completes.  Both processes running
if-pre-up.d/ifenslave would now continue after bond0 has finished its
configuration.

ifup is idempotent, so there's no harm in running it every time.  In
fact, with how the script is written now, there's a race condition
where ifquery could return failure, and in-between the return and the
ifup, the interface could come up.  If that happens, ifup gets run on
a bond interface in the up state.  Because of the idempotence of ifup,
there's no harm in this hypothetical race condition, but the script
already relies on it, so I'd suggest removing the ifquery condition
altogether.

The only downside I see to running `ifup` every time is that I now get
extra messages in my bootup scenario:

ifup[283]: ifup: waiting for lock on /run/network/ifstate.bond0
sh[311]: ifup: interface bond0 already configured

But if I understand correctly, those are informational, and not
indicating an error.

When I remove that ifquery condition, I no longer encounter the
problem, even after letting this machine reboot in a loop all night.
I'm attaching my modified version of the script.

Do you think this fix could be accepted?  If you think it'd be more
appropriate for me to file a bug against ifupdown regarding ifquery,
please let me know.

Thanks,
-Jean-Paul

-- System Information:
Debian Release: 11.2
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: armel (armv6l)

Kernel: Linux 5.10.0-10-rpi
Kernel taint flags: TAINT_CRAP, TAINT_UNSIGNED_MODULE
Locale: LANG=C, LC_CTYPE=C (charmap=UTF-8) (ignored: LC_ALL set to en_US.UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages ifenslave depends on:
ii  ifupdown  0.8.36
ii  iproute2  5.10.0-4

Versions of packages ifenslave recommends:
ii  net-tools  1.60+git20181103.0eebece-1

ifenslave suggests no packages.

-- no debconf information
# interfaces(5) file used by ifup(8) and ifdown(8)

auto bond0
iface bond0 inet static
	# No bond-slaves; let them come up on their own via hotplugging.
	bond-slaves none
	bond-mode active-backup
	bond-primary enxb827eb9e4634
	bond-miimon 100
	address 192.168.66.19/21
	gateway 192.168.64.1
	dns-nameserver 192.168.66.12
	dns-search thoughtcrime.us
iface bond0 inet6 static
	address 2002:ce3f:e590:1:1::19/64
	# DAD fails because there will be no slaves at the time of
	# this bond interface coming up.  DAD failure will result
	# in ifup failure.  Just skip DAD.
	dad-attempts 0

allow-hotplug enxb827eb9e4634
iface enxb827eb9e4634 inet manual
	bond-master bond0
	bond-mode active-backup
	bond-primary enxb827eb9e4634
	bond-miimon 100

allow-hotplug wlx0013efd01275
iface wlx0013efd01275 inet manual
	wpa-ssid [redacted]
	wpa-psk [redacted]
	bond-master bond0
	bond-mode active-backup
	bond-primary enxb827eb9e4634
	bond-miimon 100
#!/bin/sh

[ "$VERBOSITY" = 1 ] && set -x

[ "$ADDRFAM" = meta ] && exit 0

add_master()
{
	# Return if $IFACE is already a bonding interface.
	[ -f "/sys/class/net/$IFACE/bonding/slaves" ] && return

	ip link add dev "$IFACE" type bond
}

sysfs_change_down()
{
	# Called with :
	# $1 = basename of the file in bonding/ to write to.
	# $2 = value to write. Won't write if $2 is empty.
	if [ -n "$2" ] ; then
		# If the value we plan to write is different from the current one...
		if ! grep -sq "\\<$2\\>" "/sys/class/net/$BOND_MASTER/bonding/$1" ; then
			# ...and the master is up...
			if ip link show "$BOND_MASTER" | grep -sq '[<,]UP[,>]' ; then
				# ...bring the master down.
				ip link set dev "$BOND_MASTER" down
			fi
		fi
		sysfs "$1" "$2"
	fi
}

sysfs()
{
	# Called with :
	# $1 = basename of the file in bonding/ to write to.
	# $2 = value to write. Won't write if $2 is empty.
	if [ -n "$2" ] ; then
		echo "$2" > "/sys/class/net/$BOND_MASTER/bonding/$1"
		return $?
	fi
	return 0
}

sysfs_add()
{
	#??Called with :
	# $1 = target filename.
	# $2 = values to write.
	for value in $2; do
		# Do not add $2 to $1 if already present.
		if ! grep -sq "\\<$value\\>" "/sys/class/net/$BOND_MASTER/bonding/$1"
		then
		    sysfs "$1" "+$value"
		fi
	done
}

# early_setup_master is the place where we do master setup that need to be done before enslavement.
early_setup_master()
{
	# Warning: the order in which we write into the sysfs files is important.
	# Double check in drivers/net/bonding/bond_sysfs.c in the Linux kernel source tree
	# before changing anything here.

	# fail_over_mac must be set before enslavement of any slaves.
	sysfs fail_over_mac "$IF_BOND_FAIL_OVER_MAC"
}

enslave_slaves()
{
	case "$BOND_SLAVES" in
		none)
			BOND_SLAVES=""
			;;
		all)
			BOND_SLAVES=$(sed -ne 's/ *\(eth[^:]*\):.*/\1/p' /proc/net/dev)
			;;
	esac

	[ "$VERBOSITY" = 1 ] && v=-v
	for slave in $BOND_SLAVES ; do
		export IFENSLAVE_ENV_NAME="IFUPDOWN_$slave"
		IFUPDOWN_IFACE="$(printenv "$IFENSLAVE_ENV_NAME")"
		unset IFENSLAVE_ENV_NAME
		if ( ( ifquery --state "$slave" >/dev/null 2>&1 ) && ( ifquery "$slave" | grep -q bond-master ) ) || [ -n "$IFUPDOWN_IFACE" ] ; then
			# Skipping interface that's already up or being configured and has bonding configuration
			continue
		else
			# Ensure $slave is down.
			ip link set "$slave" down 2>/dev/null
			if ! sysfs_add slaves "$slave" 2>/dev/null ; then
				echo "Failed to enslave $slave to $BOND_MASTER. Is $BOND_MASTER ready and a bonding interface ?" >&2
			else
				# Bring up slave if it is the target of an allow-bondX stanza.
				# This is useful to bring up slaves that need extra setup.
				ifup $v --allow "$BOND_MASTER" "$slave"
			fi
		fi
	done
}

setup_master()
{
	# Warning: the order in which we write into the sysfs files is important.
	# Double check in drivers/net/bonding/bond_sysfs.c in the Linux kernel source tree
	# before changing anything here.

	# use_carrier can be set anytime.
	sysfs use_carrier "$IF_BOND_USE_CARRIER"
	# num_grat_arp can be set anytime.
	sysfs num_grat_arp "$IF_BOND_NUM_GRAT_ARP"
	# num_unsol_na can be set anytime.
	sysfs num_unsol_na "$IF_BOND_NUM_UNSOL_NA"

	# arp_ip_target must be set before arp_interval.
	sysfs_add arp_ip_target "$IF_BOND_ARP_IP_TARGET"
	sysfs arp_interval "$IF_BOND_ARP_INTERVAL"

	# miimon must be set before updelay and downdelay.
	sysfs miimon "$IF_BOND_MIIMON"
	sysfs downdelay "$IF_BOND_DOWNDELAY"
	sysfs updelay "$IF_BOND_UPDELAY"

	# Changing ad_select requires $BOND_MASTER to be down.
	sysfs_change_down ad_select "$IF_BOND_AD_SELECT"

	# Changing mode requires $BOND_MASTER to be down.
	# Mode should be set after miimon or arp_interval, to avoid a warning in syslog.
	sysfs_change_down mode "$IF_BOND_MODE"

	# Requires $BOND_MASTER to be down and mode to be configured
	sysfs_change_down xmit_hash_policy "$IF_BOND_XMIT_HASH_POLICY"
	sysfs_change_down tlb_dynamic_lb "$IF_BOND_TLB_DYNAMIC_LB"

	# packets_per_slave allowed for mode balance-rr only.
	sysfs packets_per_slave "$IF_BOND_PACKETS_PER_SLAVE"

	# arp_validate must be after mode (because mode must be active-backup).
	sysfs arp_validate "$IF_BOND_ARP_VALIDATE"

	# lacp_rate must be set after mode (because mode must be 802.3ad).
	# Changing lacp_rate requires $BOND_MASTER to be down.
	sysfs_change_down lacp_rate "$IF_BOND_LACP_RATE"

	# queue_id must be set after enslavement.
	for iface_queue_id in $IF_BOND_QUEUE_ID ; do
		sysfs iface_queue_id "$iface_queue_id"
	done

	# active_slave must be set after mode and after enslavement.
	# The slave must be up and the underlying link must be up too.
	# FIXME: We should have a way to write an empty string to active_slave, to set the active_slave to none.
	if [ -n "$IF_BOND_ACTIVE_SLAVE" ] ; then
		if [ "$IF_BOND_ACTIVE_SLAVE" = "none" ] ; then
			sysfs active_slave ""
		else
			# Need to force interface up before. Bonding will refuse to activate a down interface.
			if ifquery -l "$IF_BOND_ACTIVE_SLAVE" >/dev/null 2>&1 ; then
				ifup "$IF_BOND_ACTIVE_SLAVE"
			else
				ip link set "$IF_BOND_ACTIVE_SLAVE" up
			fi
			sysfs active_slave "$IF_BOND_ACTIVE_SLAVE"
		fi
	fi
}

setup_primary() {
	# primary must be set after mode (because only supported in some modes) and after enslavement.
	# The first slave in bond-primary found in current slaves becomes the primary.
	# If no slave in bond-primary is found, then the primary does not change.
	for slave in $IF_BOND_PRIMARY ; do
		slaves="/sys/class/net/$BOND_MASTER/bonding/slaves"
		if grep -sq "\\<$slave\\>" "$slaves" ; then
			sysfs primary "$slave"
			break
		fi
	done

	# primary_reselect should be set after mode (it is only supported in some modes), after enslavement
	# and after primary. This is currently (2.6.35-rc1) not enforced by the bonding driver, but it is
	# probably safer to do it in that order.
	sysfs primary_reselect "$IF_BOND_PRIMARY_RESELECT"
}

setup_master_device() {
	add_master
	early_setup_master
	setup_master
	enslave_slaves
	setup_primary
}

setup_slave_device() {
	# Require the bond master to have an iface stanza
	if ! ifquery -l "$IF_BOND_MASTER" >/dev/null 2>&1 ; then
		echo "No iface stanza found for master $IF_BOND_MASTER" >&2
		exit 1
	fi

	# Ensure the master is up or being configured
	export IFENSLAVE_ENV_NAME="IFUPDOWN_$IF_BOND_MASTER"
	IFUPDOWN_IF_BOND_MASTER="$(printenv "$IFENSLAVE_ENV_NAME")"
	unset IFENSLAVE_ENV_NAME
	if [ -z "$IFUPDOWN_IF_BOND_MASTER" ] ; then
		ifup "$IF_BOND_MASTER"
	fi

	# Enslave it to the master
	ip link set "$1" down 2>/dev/null
	if ! sysfs_add slaves "$1" 2>/dev/null ; then
		echo "Failed to enslave $1 to $BOND_MASTER." >&2
	fi

	setup_primary
}

# Option slaves deprecated, replaced by bond-slaves, but still supported for backward compatibility.
IF_BOND_SLAVES=${IF_BOND_SLAVES:-$IF_SLAVES}

if [ -n "$IF_BOND_MASTER" ] ; then
	# FIXME: use function arguments instead of this variable
	BOND_MASTER="$IF_BOND_MASTER"
	setup_slave_device "$IFACE"
elif [ -n "$IF_BOND_SLAVES" ] || [ -n "$IF_BOND_MODE" ] ; then
	# FIXME: use function arguments instead of these variables
	BOND_MASTER="$IFACE"
	BOND_SLAVES="$IF_BOND_SLAVES"
	setup_master_device "$IFACE"
fi

exit 0

Reply to: