Bug#375149: Linux kernel IPv6 : random TCP connection failure

To: Benoit Branciard <Benoit.Branciard@univ-paris1.fr>
Cc: 375149@bugs.debian.org, jmm@debian.org
Subject: Bug#375149: Linux kernel IPv6 : random TCP connection failure
From: Moritz Muehlenhoff <jmm@inutil.org>
Date: Fri, 28 Nov 2008 00:10:50 +0100
Message-id: <[🔎] 20081127231050.GA18337@galadriel.inutil.org>
Reply-to: Moritz Muehlenhoff <jmm@inutil.org>, 375149@bugs.debian.org
In-reply-to: <449C3E1C.8020206@univ-paris1.fr>
References: <449C3E1C.8020206@univ-paris1.fr>

On Fri, Jun 23, 2006 at 09:16:44PM +0200, Benoit Branciard wrote:
> Package: linux-source-2.6.16
> Version: 2.6.16-2
>
> When a great number of IPv6 TCP connections are initiated from the Linux
> machine at high rate, some of them get stalled in SYN_SENT state and
> eventually time out after tcp_syn_retries (about 3 minutes).
>
>
> The remote server does NOT seem to see the connection at all (no
> SYN_RECV report with netstat).
>
> This behaviour was noticed initially using LDAP queries. Further
> investigations reported the same problem with SMTP requests, but NOT
> with HTTP (maybe related to the short-living TIME_WAIT state of HTTP
> connections ?).
> The failure rate is about 1-2 to 5000 on a busy machine (for example one
> hosting a web server), and harder to obtain on a quiet one.
>
> How to reproduce :
>
> - have a dual-stack LDAP or SMTP server ready, on a IPv6-enabled network
> (let's call it myserver)
>
> - on the Linux client to be tested, launch a loop of quick TCP
> connections to myserver :
>
> --> example 1 : loop of 5000 anonymous LDAP searches from a bash shell :
>
> $ i=0; while [ $i -lt 5000 ] ; do ldapsearch -H ldap://myserver -x -b
> dc=mydomain,dc=myroot '(uid=someuid)' > /dev/null ; i=$((i+1)) ; [
> $((i%100)) -eq 0 ] && echo $i ; done
>
> --> example 2 : loop of 5000 SMTP connexions from a bash shell (uses the
> echoping package) :
>
> $ i=0; while [ $i -lt 5000 ] ; do echoping -6 -S myserver >/dev/null ;
> i=$((i+1)) ; [ $((i%100)) -eq 0 ] && echo $i ; done
>
> Both examples should print the query number every hundred connections.
> If a connection gets stalled, the query count hangs, and a netstat
> command (in another shell) should display the SYN_SENT stalled connection :
>
> tcp6       0      0 myclient.mydomain:51930 myserver.mydomain:ldap TIME_WAIT
> (.. a bunch of other TIME_WAIT closing connexions ..)
> tcp6       0      1 myclient.mydomain:51940 myserver.mydomain:ldap SYN_SENT
>
> The number of TIME_WAIT connections in our case is about a few hundreds,
> so the tcp_max_tw_buckets value should not be an issue.
>
> The same experiments have NOT shown any stalling connections when using
> IPv4 in the same conditions (either by explicitly specifying the IPv4
> address of myserver, or by means of the "-4" option of echoping).
>
>
> We are using Debian GNU/Linux 3.1, libc6 2.3.2.ds1-22sarge3, and a
> compiled linux-source-2.6.16 (2.6.16-2) kernel with the stock
> 2.6.16-1-686-smp (or amd64-k8) unmodified config file.
>
> Same results have been achieved using several physical Debian client
> machines with similar config and different ethernet adapters (e1000 and
> tg3), against several LDAP or SMTP servers, and with various ethernet
> switches.
>
> Also noted on a Mandriva Linux 2006.0 client with 2.6.12-18mdk kernel
> and glibc-2.3.5-5mdk.
>
> So this sounds like a general bug in the Linux 2.6 IPv6 TCP stack.

Does this error still occur with more recent kernel versions?

Cheers,
        Moritz

Reply to:

Prev by Date: Bug#375092: [powerpc] kernel 2.6 IPS driver failure with IBM ServeRAID 4H adapter
Next by Date: Bug#375422: Boot failure with kernel 2.6.15 on G4
Previous by thread: Bug#375092: [powerpc] kernel 2.6 IPS driver failure with IBM ServeRAID 4H adapter
Next by thread: Bug#375422: Boot failure with kernel 2.6.15 on G4
Index(es):
- Date
- Thread