Bug#375149: Linux kernel IPv6 : random TCP connection failure
When a great number of IPv6 TCP connections are initiated from the Linux
machine at high rate, some of them get stalled in SYN_SENT state and
eventually time out after tcp_syn_retries (about 3 minutes).
The remote server does NOT seem to see the connection at all (no
SYN_RECV report with netstat).
This behaviour was noticed initially using LDAP queries. Further
investigations reported the same problem with SMTP requests, but NOT
with HTTP (maybe related to the short-living TIME_WAIT state of HTTP
The failure rate is about 1-2 to 5000 on a busy machine (for example one
hosting a web server), and harder to obtain on a quiet one.
How to reproduce :
- have a dual-stack LDAP or SMTP server ready, on a IPv6-enabled network
(let's call it myserver)
- on the Linux client to be tested, launch a loop of quick TCP
connections to myserver :
--> example 1 : loop of 5000 anonymous LDAP searches from a bash shell :
$ i=0; while [ $i -lt 5000 ] ; do ldapsearch -H ldap://myserver -x -b
dc=mydomain,dc=myroot '(uid=someuid)' > /dev/null ; i=$((i+1)) ; [
$((i%100)) -eq 0 ] && echo $i ; done
--> example 2 : loop of 5000 SMTP connexions from a bash shell (uses the
echoping package) :
$ i=0; while [ $i -lt 5000 ] ; do echoping -6 -S myserver >/dev/null ;
i=$((i+1)) ; [ $((i%100)) -eq 0 ] && echo $i ; done
Both examples should print the query number every hundred connections.
If a connection gets stalled, the query count hangs, and a netstat
command (in another shell) should display the SYN_SENT stalled connection :
tcp6 0 0 myclient.mydomain:51930 myserver.mydomain:ldap TIME_WAIT
(.. a bunch of other TIME_WAIT closing connexions ..)
tcp6 0 1 myclient.mydomain:51940 myserver.mydomain:ldap SYN_SENT
The number of TIME_WAIT connections in our case is about a few hundreds,
so the tcp_max_tw_buckets value should not be an issue.
The same experiments have NOT shown any stalling connections when using
IPv4 in the same conditions (either by explicitly specifying the IPv4
address of myserver, or by means of the "-4" option of echoping).
We are using Debian GNU/Linux 3.1, libc6 2.3.2.ds1-22sarge3, and a
compiled linux-source-2.6.16 (2.6.16-2) kernel with the stock
2.6.16-1-686-smp (or amd64-k8) unmodified config file.
Same results have been achieved using several physical Debian client
machines with similar config and different ethernet adapters (e1000 and
tg3), against several LDAP or SMTP servers, and with various ethernet
Also noted on a Mandriva Linux 2006.0 client with 2.6.12-18mdk kernel
So this sounds like a general bug in the Linux 2.6 IPv6 TCP stack.