[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1023563: linux-image-5.10.0-19-amd64: Ephemeral ports are reused too quickly, even when net.ipv4.tcp_tw_reuse = 0



Package: linux-image-5.10.0-19-amd64
Version: 5.10.149-2
Severity: important

Dear Maintainer,

Starting with linux-image-5.10.0-15-amd64 (5.10.120-1), it seems that
the kernel is reusing ephemeral tcp ports too quickly, even if
net.ipv4.tcp_tw_reuse is set to 0.

linux-image-5.10.0-14-amd64 (5.10.113-1) and all earlier versions did
not show that behaviour.

The behaviour is the same for IPv4 and IPv6.

* What led up to the situation?

I have a couple of medium-to-fairly busy web servers that open TCP
sessions (~15-20 new connections per second) to a dedicated port on a backend server. 
The connections are short-lived and terminated by the backend server
after 1 second on average.
This setup has been working for many years through many Debian releases
and kernel versions.

On July 2 2022 I updated (apt update) the systems, which upgraded the
linux kernel image from 5.10.0-14 to 5.10.0-15. 

Shortly afterwards I noticed an increasing number of connection errors
being reported by the web servers (timeouts).

Further analysis (mostly with tcpdump) showed that the web servers
had started reusing ephemeral TCP ports as shortly as 30 seconds after their
last use. At that time (30 sec) the backend server (which is also Debian) still
had the corresponding sockets in the TIME_WAIT status and replied to the
new SYN packet with an ACK instead of a SYN ACK (this is of course
normal behaviour, since the socket was still open). The web server did
not expect the ACK and discarded it, occasionally resending the SYN,
until a timeout occurred.

The choice of ephemeral source ports appeared quite erratic. For some
seconds they were chosen in ascending order as expected, then
seemed to jump back to some lower position, proceed in ascending order
from there again, then jump back to the higher position from where they
had left off before etc.

* What exactly did you do (or not do) that was effective (or
  ineffective)?

I first raised the port range for the ephemeral ports by setting
net.ipv4.ip_local_port_range=1024 60999 (from the default 32768 60999).
This alleviated the situation (so that the timeouts became less
frequent), but did not solve the problem.

I then set net.ipv4.tcp_tw_reuse = 0 (from the default 2), which did not
change anything (as is expected in this case).

* What was the outcome of this action?

None of the measures I took proved effective. 

So I downgraded the kernel to 5.10.0-14, and the problem immediately
went away. The web servers now cycle through the available ~60000
ephemeral ports and come around to reusing them long after the socket
on the backend server has been closed.


I am opening this bug here because I am not knowledgeable enough about
the Debian kernel patches to decide whether or not this issue is already
present in the upstream vanilla kernel.

Thank you for looking into this.

Best regards

Markus Wernig

-- System Information:
Debian Release: 11.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-14-amd64 (SMP w/4 CPU threads)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.utf8), LANGUAGE=en_US:en
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages linux-image-5.10.0-14-amd64 depends on:
ii  initramfs-tools [linux-initramfs-tool]  0.140
ii  kmod                                    28-1
ii  linux-base                              4.6

Versions of packages linux-image-5.10.0-14-amd64 recommends:
ii  apparmor             2.13.6-10
ii  firmware-linux-free  20200122-1

Versions of packages linux-image-5.10.0-14-amd64 suggests:
pn  debian-kernel-handbook  <none>
ii  grub-pc                 2.06-3~deb11u2
pn  linux-doc-5.10          <none>


Reply to: