[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1069048: live-boot fails to DHCP on all NICs with link up



Package: live-boot
Version: 1:20230131
Severity: important
Tags: patch


Hi,

The current behavior of live-boot is to search 5 times for network
interfaces with the carrier link up. On each run, as soon as there
is one interface with link up, the script will exit, leaving no time
for other NICs to be up in any eventual subsequent run.

This only works if:
- one is lucky
- if only the interfaces with DHCP have an actual ethernet link.

For cases where there is more than one interface with the link up,
but only one is connected to a DHCPd server, it is possible that it
will fail (depending which card will have the link first).

The attached patch changes the behavior: it makes sure that all cards
with a link that is up are reported in /conf/param.conf before
exiting, so that live-boot will try to get an IP address from
all cards with link up. Each card continues to have a 15 seconds
timeout (by default) to get the IP address from DHCP.

We've tested this patch in production, with such a case where it
was failing (ie: our 25Gbits/s cards were detected first, but were not
connected to a DHCP server, while the 1Gbits/s cards that were supposed
to be holding the network boot were never tried by live-boot). And
this patch fixed things for us.

Please merge this patch if you feel like it's correct. I also would
like to have it fixed in Stable if possible (once I have the approval
from the team).

Cheers,

Thomas Goirand (zigo)

P.S: If one would like to test it, the easiest way is to build a
Debian live the normal way, then unpack the ramdisk with cpio with
something like this:
zstdcat <path-to-initrd> | | cpio -idmv

Then recompress like this:
find . | cpio --create --format='newc' | zstd > <path-to-initrd>

If running an older version of Debian, replacing zstdcat by zcat and
zstd by "gzip -9" also works.
>From 899aa9e8625570137fc57c4ed675bcb090119ace Mon Sep 17 00:00:00 2001
From: Thomas Goirand <zigo@debian.org>
Date: Mon, 15 Apr 2024 15:40:46 +0200
Subject: [PATCH] Do DHCP on multiple interfaces

The current behavior of live-boot is to search 5 times for network
interfaces with the carrier link up. If there is more than one
interface, but only one is found during a run, then it currently
gives-up searching for other interfaces and exits.

This works if one is lucky, or if only the interfaces with DHCP
have an actual ethernet link. For cases where there is more than
one interface with the link up, but only one is connected to a
DHCPd server, it is possible that it will fail (depending which
card will have the link first).

This patch changes the behavior: it makes sure that all cards
with a link that is up are reported in /conf/param.conf before
exiting, so that live-boot will try to get an IP address from
all cards with link up. Each card continues to have a 15 seconds
timeout (by default) to get the IP address from DHCP.
---
 components/9990-select-eth-device.sh | 68 ++++++++++++++++------------
 1 file changed, 39 insertions(+), 29 deletions(-)

diff --git a/components/9990-select-eth-device.sh b/components/9990-select-eth-device.sh
index b660a3d..719a234 100755
--- a/components/9990-select-eth-device.sh
+++ b/components/9990-select-eth-device.sh
@@ -93,46 +93,56 @@ Select_eth_device ()
 	fi
 
 	found_eth_dev=""
-	while true
+	echo -n "Looking for a connected Ethernet interface."
+
+	for interface in $l_interfaces
 	do
-		echo -n "Looking for a connected Ethernet interface ..."
+		# ATTR{carrier} is not set if this is not done
+		echo -n " $interface ?"
+		ipconfig -c none -d $interface -t 1 >/dev/null 2>&1
+		sleep 1
+	done
+
+	echo ''
 
+	for step in 1 2 3 4 5
+	do
 		for interface in $l_interfaces
 		do
-			# ATTR{carrier} is not set if this is not done
-			echo -n " $interface ?"
-			ipconfig -c none -d $interface -t 1 >/dev/null 2>&1
-			sleep 1
-		done
-
-		echo ''
+			# Skip the interface if it's already found.
+			IN_IT=no
+			for DEV in $found_eth_dev ; do
+				if [ "${DEV}" = "$interface" ] ; then
+					IN_IT=yes
+				fi
+			done
 
-		for step in 1 2 3 4 5
-		do
-			for interface in $l_interfaces
-			do
+			if [ "${IN_IT}" = "no" ] ; then
 				ip link set $interface up
 				carrier=$(cat /sys/class/net/$interface/carrier \
 					2>/dev/null)
 				# link detected
-
-				case "${carrier}" in
-					1)
-						echo "Connected $interface found"
-						# inform initrd's init script :
+				if [ "${carrier}" = 1 ] ; then
+					echo "Connected $interface found"
+					# inform initrd's init script :
+					if [ -z "${found_eth_dev}" ] ; then
+						found_eth_dev="$interface"
+					else
 						found_eth_dev="$found_eth_dev $interface"
-						found_eth_dev="$(echo $found_eth_dev | sed -e "s/^[[:space:]]*//g")"
-						;;
-				esac
-			done
-			if [ -n "$found_eth_dev" ]
-			then
-				echo "DEVICE='$found_eth_dev'" >> /conf/param.conf
-				return
-			else
-				# wait a bit
-				sleep 1
+					fi
+				fi
 			fi
 		done
+		# wait a bit
+		sleep 1
 	done
+	if [ -n "$found_eth_dev" ]
+	then
+		echo "Done searching for connected Ethernet interface."
+		echo "Writing DEVICE='$found_eth_dev' in /conf/param.conf."
+		echo "DEVICE='$found_eth_dev'" >> /conf/param.conf
+	else
+		echo "Could not find an interface that is up: giving-up..."
+	fi
+	return
 }
-- 
2.39.2


Reply to: