[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#754722: Security upgrade causes severe network packet loss



Package: linux-image-3.2.0-4-amd64
Version: 3.2.60-1+deb7u1
Severity: important


After installing security upgrade 3.2.60-1+deb7u1 on my machines, a
pool of machines providing virtual desktop to end users turned pretty
much unusable. To explain the situation, here's a little picture:

	    	      /- VM0
	      /- N0 -/ - VM1
  Srv ---- G / 	     \ - ...
     	     \	   	    
     	      \ -N1    ...  
	       	       	  
	     .....

"N0", "N1", ..  a bunch of diskless machines running virtual machines based
                on QEMU/KVM; network connectivity for the VMs via bridges

"G"             coordinates the pool and connects it to the rest of the
                network under a single IP address (SNAT)

"Srv" stands for any of a bunch of other servers (mostly Samba)

Most machines are running with the 3.2.60-1+deb7u1 kernel without any
visible problems, only on the machine named "G" in my picture, it had
devastating effects (for the moment, I downgraded to the previous kernel,
3.2.57-3+deb7u2, with which everything works fine).

VMs running Windows XP (unfortunately, this means almost all involved
virtual desktops)-: don't have get any useable network connections  to
the rest of the network anymore. Smaller packets seem to be totally
unaffected, but anything involving larger packets is unusable: reading
a 50k file from a Samba share takes ~ 1 minute (effects when writing
seem to be less extreme). It seems like VMs running Linux are affected,
too, but to a much lesser degree.

I am no networking expert, but as far as I can figure out from tracing
the traffic, there seems to be some problems with fragmentation involved.
It seems, that machine "G" with kernel 3.2.60-1+deb7u1 drops not all, but
most packets using the full MTU of 1500 bytes (and sending ICMP messages
recommending a MTU of 1500 bytes). I am pretty sure that no bigger packet
gets on the wire. The network interfaces use their default settings, which
also includes "tcp-segmentation-offload", so packets captured on a given
machine may bigger, Why the effects on Windows XP are so extreme, while
other systems are still working mostly riddles me ...

All involved machines run Debian wheezy amd64 (up to date except the
kernel on machine "G"), all have Intel network interfaces using the
"e1000e" driver, usually 2 of them connected via bonding. Almost all
network connections involve 802.1q VLAN interfaces.

Unfortunately, I can't come up with some easier setup to reproduce the
problem. Maybe somebody else hase similar problems, too? I can't
experiment too heavily, because the systems are in productive use, but
at least 1 thing is clear and 100% reproducible: all my problems are
caused by some change between kernel 3.2.57-3+deb7u2 and
3.2.60-1+deb7u1 (and probably not the ptrace bug that was the major
reason for the upgrade). I am a little worried that whatever breaks
my VDI pool may be included in all future kernel versions ...


Reply to: