[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#964596: marked as done (cloud.debian.org: Debian 10 EC2: IPv4 address suddenly flushed)



Your message dated Mon, 3 Aug 2020 09:09:16 -0700
with message-id <20200803160916.GC2714@doom.morgul.net>
and subject line fix released with 10.5 cloud images
has caused the Debian Bug report #964596,
regarding cloud.debian.org: Debian 10 EC2: IPv4 address suddenly flushed
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
964596: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=964596
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: cloud.debian.org
Severity: major
User: cloud.debian.org@packages.debian.org
Usertags: aws

Problem:
Production systems in AWS lose all network connectivity after 1h, after a dist-upgrade from Debian 9 to Debian 10 has been performed.
One can't ssh in to investigate and no remote console exists in AWS.
Fortunately, you *can* restart the EC2 instance, which will generate a new dhcp lease and give you another 1h of access before the access is cut again.

How to reproduce:

Install a Debian 9 machine using the official Debian 9 AMI.

During the hardening of the machine, disable IPv6 completely:
# cat /etc/sysctl.d/disable_ipv6.conf
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.eth0.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

This hardened Debian 9 server works perfectly for a year.

Now perform a dist-upgrade to Debian 10.

Everything looks good. No errors during the upgrade.
After the final reboot, the server comes online as it should.

BUT...
After 1 hour we suddenly lose all access to the server.

A reset of the EC2 brings the access back, only to be lost again 1h later.

(unfortunately, neither dhclient nor the cloud-init scripts syslogged any error, so it was pretty hard to figure out what was wrong)

It turns out to be the IPv6 hardening that generates problems for dhclient/ifup.

I believe the problem lies in /sbin/dhclient-script :
        if [ -n "$old_ip_address" ] &&
           [ "$old_ip_address" != "$new_ip_address" ]; then
            # leased IP has changed => flush it
            ip -4 addr flush dev ${interface} label ${interface}
        fi

My guess is that when dhclient fails to set an IPv6 IP, the above code flushes the current IPv4 configured on the machine, making it lose all network connectivity.



My current workaround is to *not* do the above IPv6 hardening, then the server works fine.




My /etc/network/interfaces configuration:
# interfaces(5) file used by ifup(8) and ifdown(8)
# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet dhcp
allow-hotplug eth0
iface eth0 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth1 inet dhcp
allow-hotplug eth1
iface eth1 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth2 inet dhcp
allow-hotplug eth2
iface eth2 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth3 inet dhcp
allow-hotplug eth3
iface eth3 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth4 inet dhcp
allow-hotplug eth4
iface eth4 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth5 inet dhcp
allow-hotplug eth5
iface eth5 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth6 inet dhcp
allow-hotplug eth6
iface eth6 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth7 inet dhcp
allow-hotplug eth7
iface eth7 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper
iface eth8 inet dhcp
allow-hotplug eth8
iface eth8 inet6 manual
  up /usr/local/sbin/inet6-ifup-helper
  down /usr/local/sbin/inet6-ifup-helper


Log:
Jul 8 10:13:36 foobar ifup[363]: RTNETLINK answers: File exists
Jul 8 10:13:36 foobar ifup[363]: invoke-rc.d: could not determine current runlevel
Jul 8 10:13:36 foobar dhclient[571]: bound to 10.75.75.75 -- renewal in 1491 seconds.
Jul 8 10:13:36 foobar ifup[363]: bound to 10.75.75.75 -- renewal in 1491 seconds.
Jul 8 10:13:36 foobar ifup[363]: Could not get a link-local address
Jul 8 10:13:36 foobar ifup[363]: ifup: failed to bring up eth0

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 06:ce:43:75:75:75 brd ff:ff:ff:ff:ff:ff


Additional findings:
If I compare the contents of the dir /etc/network/ of this 9-->10 dist-upgraded machine, it differs from a machine that is installed directly with the Debian 10 AMI:
dist-upgraded:/etc/network> ls
if-down.d/  if-post-down.d/  if-pre-up.d/  if-up.d/  interfaces  interfaces.d/

pure deb10:/etc/network> ls
cloud-ifupdown-helper*     if-down.d/       if-pre-up.d/  interfaces
cloud-interfaces-template  if-post-down.d/  if-up.d/      interfaces.d/

This makes me think that the cloud-init package for Debian 10 does something wrong.


Somewhat related bug: #846583

/Martin

--- End Message ---
--- Begin Message ---
We just published images for Debian 10.5 including the fix for this
issue.  Details are at
https://wiki.debian.org/Cloud/AmazonEC2Image/Buster as usual.

--- End Message ---

Reply to: