Bug#375149: marked as done (Linux kernel IPv6 : random TCP connection failure)

To: Moritz Muehlenhoff <jmm@inutil.org>
Subject: Bug#375149: marked as done (Linux kernel IPv6 : random TCP connection failure)
From: owner@bugs.debian.org (Debian Bug Tracking System)
Date: Thu, 18 Dec 2008 20:30:04 +0000
Message-id: <[🔎] handler.375149.D375149.122963203731240.ackdone@bugs.debian.org>
References: <20081218202709.GA3675@galadriel.inutil.org> <449C3E1C.8020206@univ-paris1.fr>

Your message dated Thu, 18 Dec 2008 21:27:09 +0100
with message-id <20081218202709.GA3675@galadriel.inutil.org>
and subject line Fixed
has caused the Debian Bug report #375149,
regarding Linux kernel IPv6 : random TCP connection failure
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
375149: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=375149
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems

--- Begin Message ---

To: submit@bugs.debian.org
Subject: Linux kernel IPv6 : random TCP connection failure
From: Benoit Branciard <Benoit.Branciard@univ-paris1.fr>
Date: Fri, 23 Jun 2006 21:16:44 +0200
Message-id: <449C3E1C.8020206@univ-paris1.fr>

Package: linux-source-2.6.16
Version: 2.6.16-2

When a great number of IPv6 TCP connections are initiated from the Linux
machine at high rate, some of them get stalled in SYN_SENT state and
eventually time out after tcp_syn_retries (about 3 minutes).


The remote server does NOT seem to see the connection at all (no
SYN_RECV report with netstat).

This behaviour was noticed initially using LDAP queries. Further
investigations reported the same problem with SMTP requests, but NOT
with HTTP (maybe related to the short-living TIME_WAIT state of HTTP
connections ?).
The failure rate is about 1-2 to 5000 on a busy machine (for example one
hosting a web server), and harder to obtain on a quiet one.

How to reproduce :

- have a dual-stack LDAP or SMTP server ready, on a IPv6-enabled network
(let's call it myserver)

- on the Linux client to be tested, launch a loop of quick TCP
connections to myserver :

--> example 1 : loop of 5000 anonymous LDAP searches from a bash shell :

$ i=0; while [ $i -lt 5000 ] ; do ldapsearch -H ldap://myserver -x -b
dc=mydomain,dc=myroot '(uid=someuid)' > /dev/null ; i=$((i+1)) ; [
$((i%100)) -eq 0 ] && echo $i ; done

--> example 2 : loop of 5000 SMTP connexions from a bash shell (uses the
echoping package) :

$ i=0; while [ $i -lt 5000 ] ; do echoping -6 -S myserver >/dev/null ;
i=$((i+1)) ; [ $((i%100)) -eq 0 ] && echo $i ; done

Both examples should print the query number every hundred connections.
If a connection gets stalled, the query count hangs, and a netstat
command (in another shell) should display the SYN_SENT stalled connection :

tcp6       0      0 myclient.mydomain:51930 myserver.mydomain:ldap TIME_WAIT
(.. a bunch of other TIME_WAIT closing connexions ..)
tcp6       0      1 myclient.mydomain:51940 myserver.mydomain:ldap SYN_SENT

The number of TIME_WAIT connections in our case is about a few hundreds,
so the tcp_max_tw_buckets value should not be an issue.

The same experiments have NOT shown any stalling connections when using
IPv4 in the same conditions (either by explicitly specifying the IPv4
address of myserver, or by means of the "-4" option of echoping).


We are using Debian GNU/Linux 3.1, libc6 2.3.2.ds1-22sarge3, and a
compiled linux-source-2.6.16 (2.6.16-2) kernel with the stock
2.6.16-1-686-smp (or amd64-k8) unmodified config file.

Same results have been achieved using several physical Debian client
machines with similar config and different ethernet adapters (e1000 and
tg3), against several LDAP or SMTP servers, and with various ethernet
switches.

Also noted on a Mandriva Linux 2006.0 client with 2.6.12-18mdk kernel
and glibc-2.3.5-5mdk.

So this sounds like a general bug in the Linux 2.6 IPv6 TCP stack.

--- End Message ---

--- Begin Message ---

To: 375149-done@bugs.debian.org

Subject: Fixed

From: Moritz Muehlenhoff <jmm@inutil.org>

Date: Thu, 18 Dec 2008 21:27:09 +0100

Message-id: <20081218202709.GA3675@galadriel.inutil.org>
Version: 2.6.24-1

Benoit confirmed in off-bug communication that this issue is fixed in
2.6.24 from Etch-n-Half.

Cheers,
        Moritz
--- End Message ---

Reply to:

Prev by Date: Bug#465807: marked as done (linux-image-2.6.22-3-686: CD-ROM drive semi-permanently addled by a bad CD-ROM)
Next by Date: Bug#464501: marked as done (eSCO support breaks (SCO?) headsets)
Previous by thread: Bug#465807: marked as done (linux-image-2.6.22-3-686: CD-ROM drive semi-permanently addled by a bad CD-ROM)
Next by thread: Bug#464501: marked as done (eSCO support breaks (SCO?) headsets)
Index(es):
- Date
- Thread