[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#964596: cloud.debian.org: Debian 10 EC2: IPv4 address suddenly flushed



Package: cloud.debian.org
Severity: major
User: cloud.debian.org@packages.debian.org
Usertags: aws

Problem:
Production systems in AWS lose all network connectivity after 1h, after a dist-upgrade from Debian 9 to Debian 10 has been performed.
One can't ssh in to investigate and no remote console exists in AWS.
Fortunately, you *can* restart the EC2 instance, which will generate a new dhcp lease and give you another 1h of access before the access is cut again.

How to reproduce:

Install a Debian 9 machine using the official Debian 9 AMI.

During the hardening of the machine, disable IPv6 completely:
# cat /etc/sysctl.d/disable_ipv6.conf
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.eth0.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

This hardened Debian 9 server works perfectly for a year.

Now perform a dist-upgrade to Debian 10.

Everything looks good. No errors during the upgrade.
After the final reboot, the server comes online as it should.

BUT...
After 1 hour we suddenly lose all access to the server.

A reset of the EC2 brings the access back, only to be lost again 1h later.

(unfortunately, neither dhclient nor the cloud-init scripts syslogged any error, so it was pretty hard to figure out what was wrong)

It turns out to be the IPv6 hardening that generates problems for dhclient/ifup.

I believe the problem lies in /sbin/dhclient-script :
        if [ -n "$old_ip_address" ] &&
           [ "$old_ip_address" != "$new_ip_address" ]; then
            # leased IP has changed => flush it
            ip -4 addr flush dev ${interface} label ${interface}
        fi

My guess is that when dhclient fails to set an IPv6 IP, the above code flushes the current IPv4 configured on the machine, making it lose all network connectivity.



My current workaround is to *not* do the above IPv6 hardening, then the server works fine.




My /etc/network/interfaces configuration:
# interfaces(5) file used by ifup(8) and ifdown(8)
# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet dhcp
allow-hotplug eth0
iface eth0 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth1 inet dhcp
allow-hotplug eth1
iface eth1 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth2 inet dhcp
allow-hotplug eth2
iface eth2 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth3 inet dhcp
allow-hotplug eth3
iface eth3 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth4 inet dhcp
allow-hotplug eth4
iface eth4 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth5 inet dhcp
allow-hotplug eth5
iface eth5 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth6 inet dhcp
allow-hotplug eth6
iface eth6 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth7 inet dhcp
allow-hotplug eth7
iface eth7 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth8 inet dhcp
allow-hotplug eth8
iface eth8 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper


Log:
Jul 8 10:13:36 foobar ifup[363]: RTNETLINK answers: File exists
Jul 8 10:13:36 foobar ifup[363]: invoke-rc.d: could not determine current runlevel
Jul 8 10:13:36 foobar dhclient[571]: bound to 10.75.75.75 -- renewal in 1491 seconds.
Jul 8 10:13:36 foobar ifup[363]: bound to 10.75.75.75 -- renewal in 1491 seconds.
Jul 8 10:13:36 foobar ifup[363]: Could not get a link-local address
Jul 8 10:13:36 foobar ifup[363]: ifup: failed to bring up eth0

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 06:ce:43:75:75:75 brd ff:ff:ff:ff:ff:ff


Additional findings:
If I compare the contents of the dir /etc/network/ of this 9-->10 dist-upgraded machine, it differs from a machine that is installed directly with the Debian 10 AMI:
dist-upgraded:/etc/network> ls
if-down.d/  if-post-down.d/  if-pre-up.d/  if-up.d/  interfaces  interfaces.d/

pure deb10:/etc/network> ls
cloud-ifupdown-helper*     if-down.d/       if-pre-up.d/  interfaces
cloud-interfaces-template  if-post-down.d/  if-up.d/      interfaces.d/

This makes me think that the cloud-init package for Debian 10 does something wrong.


Somewhat related bug: #846583

/Martin

Reply to: