[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#991613: DHCPv6 problem in our image: needs "-D LL" when spawning dhclient



Package: cloud.debian.org
Severity: serious


After spawning a VM, it takes a long time to get networking (output from
the console):

cloud-init[281]: Cloud-init v. 20.2 running 'init-local' at Wed, 28 Jul 2021 07:49:23 +0000. Up 2.98 seconds.
Started [0;1;39mInitial cloud-init job (pre-networking).
Reached target [0;1;39mNetwork (Pre).
Starting [0;1;39mRaise network interfaces...
A start job is running for Raise network interfaces (6s / 5min 1s)
A start job is running for Raise network interfaces (7s / 5min 1s)
A start job is running for Raise network interfaces (7s / 5min 1s)
[...]
A start job is running for Raise ne���ork interfaces (5min 1s / 5min 1s)
Failed to start Raise network interfaces.

A systemctl status networking.service shows:

   Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
   Active: failed (Result: timeout) since Wed 2021-07-28 07:54:23 UTC; 52min ago

This is specific to the Debian image. We've compared with Ubuntu 21.04.

Ubuntu:
- Initial boot:
2021-07-28T11:58:50.836457+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPSOLICIT(tap67fa8c3f-8d) 00:02:00:00:ab:11:11:16:f0:97:0e:c5:c9:b6
2021-07-28T11:58:50.836724+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPREPLY(tap67fa8c3f-8d) <redacted>::3ba 00:02:00:00:ab:11:11:16:f0:97:0e:c5:c9:b6 host-<redacted>--3ba

- Server side:
/var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/host:fa:16:3e:63:54:8c,tag:dhcpv6,host-<redacted>--3ba.dc3-a.pub1.infomaniak.cloud.,[<redacted>::3ba]
/var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/leases:1627559930 3042863103 <redacted>::3ba /host-<redacted>--3ba 00:02:00:00:ab:11:11:16:f0:97:0e:c5:c9:b6

Then we do "openstack server rebuild" and get the same result.

Debian:
- Intial boot:
2021-07-28T11:59:15.838131+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPSOLICIT(tap67fa8c3f-8d) 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da
2021-07-28T11:59:15.838369+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPADVERTISE(tap67fa8c3f-8d) <redacted>::143 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da host-<redacted>--143
2021-07-28T11:59:16.795826+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPREQUEST(tap67fa8c3f-8d) 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da
2021-07-28T11:59:16.796177+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPREPLY(tap67fa8c3f-8d) <redacted>::143 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da host-<redacted>--143

- Server side:
/var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/host:fa:16:3e:f1:a9:da,tag:dhcpv6,host-<redacted>--143.dc3-a.pub1.infomaniak.cloud.,[<redacted>::143]
/var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/leases:1627481056 1056025050 <redacted>::143 host-2001-1600-10-100--143 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da

Then, I do the same "openstack server rebuild" and get:

- Initial boot:
2021-07-28T12:26:38.804683+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPSOLICIT(tap67fa8c3f-8d) 00:01:00:01:28:94:09:7b:fa:16:3e:f1:a9:da
2021-07-28T12:26:38.805023+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPADVERTISE(tap67fa8c3f-8d) 00:01:00:01:28:94:09:7b:fa:16:3e:f1:a9:da no addresses available

- Server side:
/var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/host:fa:16:3e:f1:a9:da,tag:dhcpv6,host-<redacted>--143.dc3-a.pub1.infomaniak.cloud.,[<redacted>::143]
/var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/leases:1627481056 1056025050 2001:1600:10:100::143 host-<redacted>--143 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da

We see here that DHCPv6 fails because the DUID sent by the distro isn't the
same as the initial build of the VM:

2021-07-28T11:59:15.838131+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPSOLICIT(tap67fa8c3f-8d) 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da
2021-07-28T12:26:38.804683+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPSOLICIT(tap67fa8c3f-8d) 00:01:00:01:28:94:09:7b:fa:16:3e:f1:a9:da

The server kept the initial DHCPv6 lease of the first DUID, so it refuses
the request with a new one.

We see on the startup logs that the image creates a new DUID:

Jul 28 11:59:17 debianv6 sh[376]: Created duid "\000\001\000\001(\224\003\021\372\026>\361\251\332".
...
Jul 28 12:33:06 debianv6 sh[376]: Created duid "\000\001\000\001(\224\011{\372\026>\361\251\332".

Note that to convert it, we can do:

$ printf "\000\001\000\001(\224\011{\372\026>\361\251\332" | hexdump -e '14/1 "%02x " "\n"' | sed 's/ /:/g'
00:01:00:01:28:94:09:7b:fa:16:3e:f1:a9:da

So, to fix this problem, we need to fix the Debian image. Reading the
dhclient doc, we can see an interesting option:

 -D LL or LLT
              Override the default when selecting the type of DUID to use.
              By default, DHCPv6 dhclient creates an identifier based on
              the link-layer address (DUID-LL) if it is running in
              stateless mode (with -S, not  requesting  an address), or it
              creates an identifier based on the link-layer address plus a
              timestamp (DUID-LLT) if it is running in stateful mode
              (without -S, requesting an address).  When DHCPv4 is
              configured to use a DUID using -i option the default is to
              use a DUID-LLT.  -D overrides these default, with a value of
              either LL or LLT.

So, it looks like the Debian image is using the local link MAC address plus
a timestamp, which is the thing that seems to be problematic here. We need
to make it use the local link MAC address only, so that a "server rebuild"
results in a VM with IPv6 connectivity.

Note that Ubuntu isn't using dhclient, which is probably why it's not
affected.

Cloud init populates /etc/network/interfaces.d/50-cloud-init this way:

root@zigo-test-server:/etc# cat /etc/network/interfaces.d/50-cloud-init 
# This file is generated from information provided by the datasource.  Changes
# to it will not persist across an instance reboot.  To disable cloud-init's
# network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
auto lo
iface lo inet loopback
    dns-nameservers 83.166.143.51 83.166.143.52 2001:1600:0:aaaa::53:5 2001:1600:0:aaaa::53:6

auto ens3
iface ens3 inet dhcp
    accept_ra 1
    mtu 1500

# control-alias ens3
iface ens3 inet6 dhcp
    post-up route add -A inet6 default gw 2001:1600:10:100::1 || true
    pre-down route del -A inet6 default gw 2001:1600:10:100::1 || true


In ifupdown, in In https://bugs.debian.org/799257 someone suggested to use
the -D LLT option, then it went away in version 0.8.2 because of
https://bugs.debian.org/806964

So here, we probably need to get ifupdown to use the -D LL option
explicitely, but I'm not sure how to do this... Does ifupdown even has
an option for forcing that? It doesn't seem to be the case. :/

Any help or comment would be welcome.

Cheers,

Thomas Goirand (zigo)

Reply to: