[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1108860: [regression] Wireguard fragmentation fails with VXLAN since 8930424777e4 ("tunnels: Accept PACKET_HOST skb_tunnel_check_pmtu().") causing network timeouts



Hi,

On Wed, Jul 16, 2025 at 08:44:55AM -0400, Aaron Conole wrote:
> Guillaume Nault <gnault@redhat.com> writes:
> 
> > On Mon, Jul 14, 2025 at 09:57:52PM +0200, Salvatore Bonaccorso wrote:
> >> Hi,
> >> 
> >> Charles Bordet reported the following issue (full context in
> >> https://bugs.debian.org/1108860)
> >> 
> >> > Dear Maintainer,
> >> > 
> >> > What led up to the situation?
> >> > We run a production environment using Debian 12 VMs, with a network
> >> > topology involving VXLAN tunnels encapsulated inside Wireguard
> >> > interfaces. This setup has worked reliably for over a year, with MTU set
> >> > to 1500 on all interfaces except the Wireguard interface (set to 1420).
> >> > Wireguard kernel fragmentation allowed this configuration to function
> >> > without issues, even though the effective path MTU is lower than 1500.
> >> > 
> >> > What exactly did you do (or not do) that was effective (or ineffective)?
> >> > We performed a routine system upgrade, updating all packages include the
> >> > kernel. After the upgrade, we observed severe network issues (timeouts,
> >> > very slow HTTP/HTTPS, and apt update failures) on all VMs behind the
> >> > router. SSH and small-packet traffic continued to work.
> >> > 
> >> > To diagnose, we:
> >> > 
> >> > * Restored a backup (with the previous kernel): the problem disappeared.
> >> > * Repeated the upgrade, confirming the issue reappeared.
> >> > * Systematically tested each kernel version from 6.1.124-1 up to
> >> > 6.1.140-1. The problem first appears with kernel 6.1.135-1; all earlier
> >> > versions work as expected.
> >> > * Kernel version from the backports (6.12.32-1) did not resolve the
> >> > problem.
> >> > 
> >> > What was the outcome of this action?
> >> > 
> >> > * With kernel 6.1.135-1 or later, network timeouts occur for
> >> > large-packet protocols (HTTP, apt, etc.), while SSH and small-packet
> >> > protocols work.
> >> > * With kernel 6.1.133-1 or earlier, everything works as expected.
> >> > 
> >> > What outcome did you expect instead?
> >> > We expected the network to function as before, with Wireguard handling
> >> > fragmentation transparently and no application-level timeouts,
> >> > regardless of the kernel version.
> >> 
> >> While triaging the issue we found that the commit 8930424777e4
> >> ("tunnels: Accept PACKET_HOST in skb_tunnel_check_pmtu()." introduces
> >> the issue and Charles confirmed that the issue was present as well in
> >> 6.12.35 and 6.15.4 (other version up could potentially still be
> >> affected, but we wanted to check it is not a 6.1.y specific
> >> regression).
> >> 
> >> Reverthing the commit fixes Charles' issue.
> >> 
> >> Does that ring a bell?
> >
> > It doesn't ring a bell. Do you have more details on the setup that has
> > the problem? Or, ideally, a self-contained reproducer?
> 
> +1 - I tested this patch with an OVS setup using vxlan and geneve
> tunnels.  A reproducer or more details would help.

Charles, any news here, did you found a way to provide a
self-contained reproducer for your issue?

Does the issue still reproeduce for you on the most current version of
each of the affected dstable series?

Regards,
Salvatore


Reply to: