Bug#991613: DHCPv6 problem in our image: needs "-D LL" when spawning dhclient
Package: cloud.debian.org
Severity: serious
After spawning a VM, it takes a long time to get networking (output from
the console):
cloud-init[281]: Cloud-init v. 20.2 running 'init-local' at Wed, 28 Jul 2021 07:49:23 +0000. Up 2.98 seconds.
Started [0;1;39mInitial cloud-init job (pre-networking).
Reached target [0;1;39mNetwork (Pre).
Starting [0;1;39mRaise network interfaces...
A start job is running for Raise network interfaces (6s / 5min 1s)
A start job is running for Raise network interfaces (7s / 5min 1s)
A start job is running for Raise network interfaces (7s / 5min 1s)
[...]
A start job is running for Raise ne���ork interfaces (5min 1s / 5min 1s)
Failed to start Raise network interfaces.
A systemctl status networking.service shows:
Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
Active: failed (Result: timeout) since Wed 2021-07-28 07:54:23 UTC; 52min ago
This is specific to the Debian image. We've compared with Ubuntu 21.04.
Ubuntu:
- Initial boot:
2021-07-28T11:58:50.836457+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPSOLICIT(tap67fa8c3f-8d) 00:02:00:00:ab:11:11:16:f0:97:0e:c5:c9:b6
2021-07-28T11:58:50.836724+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPREPLY(tap67fa8c3f-8d) <redacted>::3ba 00:02:00:00:ab:11:11:16:f0:97:0e:c5:c9:b6 host-<redacted>--3ba
- Server side:
/var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/host:fa:16:3e:63:54:8c,tag:dhcpv6,host-<redacted>--3ba.dc3-a.pub1.infomaniak.cloud.,[<redacted>::3ba]
/var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/leases:1627559930 3042863103 <redacted>::3ba /host-<redacted>--3ba 00:02:00:00:ab:11:11:16:f0:97:0e:c5:c9:b6
Then we do "openstack server rebuild" and get the same result.
Debian:
- Intial boot:
2021-07-28T11:59:15.838131+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPSOLICIT(tap67fa8c3f-8d) 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da
2021-07-28T11:59:15.838369+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPADVERTISE(tap67fa8c3f-8d) <redacted>::143 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da host-<redacted>--143
2021-07-28T11:59:16.795826+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPREQUEST(tap67fa8c3f-8d) 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da
2021-07-28T11:59:16.796177+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPREPLY(tap67fa8c3f-8d) <redacted>::143 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da host-<redacted>--143
- Server side:
/var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/host:fa:16:3e:f1:a9:da,tag:dhcpv6,host-<redacted>--143.dc3-a.pub1.infomaniak.cloud.,[<redacted>::143]
/var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/leases:1627481056 1056025050 <redacted>::143 host-2001-1600-10-100--143 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da
Then, I do the same "openstack server rebuild" and get:
- Initial boot:
2021-07-28T12:26:38.804683+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPSOLICIT(tap67fa8c3f-8d) 00:01:00:01:28:94:09:7b:fa:16:3e:f1:a9:da
2021-07-28T12:26:38.805023+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPADVERTISE(tap67fa8c3f-8d) 00:01:00:01:28:94:09:7b:fa:16:3e:f1:a9:da no addresses available
- Server side:
/var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/host:fa:16:3e:f1:a9:da,tag:dhcpv6,host-<redacted>--143.dc3-a.pub1.infomaniak.cloud.,[<redacted>::143]
/var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/leases:1627481056 1056025050 2001:1600:10:100::143 host-<redacted>--143 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da
We see here that DHCPv6 fails because the DUID sent by the distro isn't the
same as the initial build of the VM:
2021-07-28T11:59:15.838131+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPSOLICIT(tap67fa8c3f-8d) 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da
2021-07-28T12:26:38.804683+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPSOLICIT(tap67fa8c3f-8d) 00:01:00:01:28:94:09:7b:fa:16:3e:f1:a9:da
The server kept the initial DHCPv6 lease of the first DUID, so it refuses
the request with a new one.
We see on the startup logs that the image creates a new DUID:
Jul 28 11:59:17 debianv6 sh[376]: Created duid "\000\001\000\001(\224\003\021\372\026>\361\251\332".
...
Jul 28 12:33:06 debianv6 sh[376]: Created duid "\000\001\000\001(\224\011{\372\026>\361\251\332".
Note that to convert it, we can do:
$ printf "\000\001\000\001(\224\011{\372\026>\361\251\332" | hexdump -e '14/1 "%02x " "\n"' | sed 's/ /:/g'
00:01:00:01:28:94:09:7b:fa:16:3e:f1:a9:da
So, to fix this problem, we need to fix the Debian image. Reading the
dhclient doc, we can see an interesting option:
-D LL or LLT
Override the default when selecting the type of DUID to use.
By default, DHCPv6 dhclient creates an identifier based on
the link-layer address (DUID-LL) if it is running in
stateless mode (with -S, not requesting an address), or it
creates an identifier based on the link-layer address plus a
timestamp (DUID-LLT) if it is running in stateful mode
(without -S, requesting an address). When DHCPv4 is
configured to use a DUID using -i option the default is to
use a DUID-LLT. -D overrides these default, with a value of
either LL or LLT.
So, it looks like the Debian image is using the local link MAC address plus
a timestamp, which is the thing that seems to be problematic here. We need
to make it use the local link MAC address only, so that a "server rebuild"
results in a VM with IPv6 connectivity.
Note that Ubuntu isn't using dhclient, which is probably why it's not
affected.
Cloud init populates /etc/network/interfaces.d/50-cloud-init this way:
root@zigo-test-server:/etc# cat /etc/network/interfaces.d/50-cloud-init
# This file is generated from information provided by the datasource. Changes
# to it will not persist across an instance reboot. To disable cloud-init's
# network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
auto lo
iface lo inet loopback
dns-nameservers 83.166.143.51 83.166.143.52 2001:1600:0:aaaa::53:5 2001:1600:0:aaaa::53:6
auto ens3
iface ens3 inet dhcp
accept_ra 1
mtu 1500
# control-alias ens3
iface ens3 inet6 dhcp
post-up route add -A inet6 default gw 2001:1600:10:100::1 || true
pre-down route del -A inet6 default gw 2001:1600:10:100::1 || true
In ifupdown, in In https://bugs.debian.org/799257 someone suggested to use
the -D LLT option, then it went away in version 0.8.2 because of
https://bugs.debian.org/806964
So here, we probably need to get ifupdown to use the -D LL option
explicitely, but I'm not sure how to do this... Does ifupdown even has
an option for forcing that? It doesn't seem to be the case. :/
Any help or comment would be welcome.
Cheers,
Thomas Goirand (zigo)
Reply to: