Re: Idle TCP connections freeze

To: debian-user@lists.debian.org
Subject: Re: Idle TCP connections freeze
From: Nikolaus Rath <Nikolaus@rath.org>
Date: Mon, 17 Dec 2012 20:46:30 -0800
Message-id: <[🔎] 87ip80c6hl.fsf@vostro.rath.org>
Mail-followup-to: debian-user@lists.debian.org
In-reply-to: <[🔎] 50CC5226.8040101@plouf.fr.eu.org> (Pascal Hambourg's message of "Sat, 15 Dec 2012 11:34:14 +0100")
References: <[🔎] 87sj77ubem.fsf@vostro.rath.org> <[🔎] 50CC5226.8040101@plouf.fr.eu.org>

Pascal Hambourg <pascal@plouf.fr.eu.org> writes:
> Hello,
>
> Nikolaus Rath a écrit :
>> 
>> I'm having trouble with an internet connection that seems to randomly
>> "freeze" arbitrary tcp connections when they have not been used for a
>> while. The connections stay established, but no data is coming through.
>
> How long is "a while", at a minimum ?

I wrote a small test program. It seems to be exactly 302 seconds, 301
still works.

>> When this happens, netstat still shows the connection status as
>> `ESTABLISHED` on both the local computer:
>> 
>>     Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name Timer
>>     tcp        0     53 192.168.0.10:41129      173.255.235.238:143     ESTABLISHED 8219/gnutls-cli  on (79.31/13/0)
>> 
>> ..and the remote server:
>> 
>>     Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name Timer
>>     tcp        0      0 173.255.235.238:143     68.5.174.98:41129       ESTABLISHED 5303/imapd       off (0.00/0/0)
>
> It appears that the client has a private addresse and the server has a
> public address. So I guess that there is a NAT device between them, and
> its stateful NAT engine may be the cause of the problem, by deleting
> connections from its translation table after a delay of inactivity.
>
>> When I look at a packet capture of this connection on the client side,
>> there is a long (expected) period of inactivity that seems to trigger
>> the problem, then the local end tries to transmit some data again but
>> never receives an ACK. Instead, 15 TCP Retransmissions go out, with
>> intervals increasing from 0.3 seconds to 120 seconds. No activity is
>> captured after that.
>
> Can you do a packet capture on the server side well ?

Yes, just tried it. The server does not receive anything at all when the
client starts retransmitting. I guess that is consistent with the NAT
explanation?


>> Does anyone have a suggestion of how I could debug this further to find
>> out where the problem lies and how to fix it?
>> 
>> Also, is is there some way to globally reduce the timeout on client
>> and/or server to reduce the time before the local application aborts?
>
> The Linux kernel supports system-wide TCP keepalive. However the
> application must enable it on a per-socket basis, and the minimum
> recommended value of 2 hours (which is the default in Linux) is quite
> high, the inactivity timeout in your NAT device may be shorter. The best
> workaround for this is to generate traffic with some kind of
> application-level keepalive, either defined in the application protocol
> such as in SSH, of by periodically sending dummy commands or data.

Yes, I guess your NAT theory makes sense. If I use ssh with
"ServerAliveInterval", or force libkeepalive use with LD_PRELOAD, the
connections survive beyond 302 seconds.

However, unfortunately this isn't a good solution, because I have
non-Linux devices in the same network that suffer from the same problem.

Is there a way to figure out at which device the NAT timeout happens? I
have a Cisco DPC3825 cable modem that does NAT. But it has just 4
Ethernet connections and WLAN, so I have a hard time believing that it
would need to force a 5 min timeout. The web administration page also
doesn't mention any timeouts (which may of course mean nothing). Is it
possible that there's a second NAT at work behind the modem?


Thanks,

   -Nikolaus

-- 
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

Reply to:

Follow-Ups:
- Re: Idle TCP connections freeze
  - From: Bob Proulx <bob@proulx.com>

References:
- Idle TCP connections freeze
  - From: Nikolaus Rath <Nikolaus@rath.org>
- Re: Idle TCP connections freeze
  - From: Pascal Hambourg <pascal@plouf.fr.eu.org>

Prev by Date: Laptop battery life with 64 bit
Next by Date: git pull fails with OpenSSL version mismatch error
Previous by thread: Re: Idle TCP connections freeze
Next by thread: Re: Idle TCP connections freeze
Index(es):
- Date
- Thread