[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Idle TCP connections freeze



Hi,

I'm having trouble with an internet connection that seems to randomly
"freeze" arbitrary tcp connections when they have not been used for a
while. The connections stay established, but no data is coming through.

When this happens, netstat still shows the connection status as
`ESTABLISHED` on both the local computer:

    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name Timer
    tcp        0     53 192.168.0.10:41129      173.255.235.238:143     ESTABLISHED 8219/gnutls-cli  on (79.31/13/0)

..and the remote server:

    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name Timer
    tcp        0      0 173.255.235.238:143     68.5.174.98:41129       ESTABLISHED 5303/imapd       off (0.00/0/0)

However, it seems that no data at all is transferred. If I run strace
on the local and remote process, both just show a repeating sequence of
select calls (with different fds of course), e.g.

    select(6, [0 5], NULL, NULL, {0, 50000}) = 0 (Timeout)
    select(6, [0 5], NULL, NULL, {0, 50000}) = 0 (Timeout)
    select(6, [0 5], NULL, NULL, {0, 50000}) = 0 (Timeout)

The internet connection overall does not seem affected, I can still
establish new connections to the same service on the same server without
any problems. However, the affected local applications seem to be
unaware of the problem and just hang.

About 10 minutes after the attempted transmission on the local end, the
connection on the remote end disappears from the netstat (I wasn't able
to catch any intermediate state), but still stays `ESTABLISHED` on the
local end.

Finally, after some more minutes, the local application aborts with a
timeout and disappears from the local netstat output as well.

When I look at a packet capture of this connection on the client side,
there is a long (expected) period of inactivity that seems to trigger
the problem, then the local end tries to transmit some data again but
never receives an ACK. Instead, 15 TCP Retransmissions go out, with
intervals increasing from 0.3 seconds to 120 seconds. No activity is
captured after that.

Does anyone have a suggestion of how I could debug this further to find
out where the problem lies and how to fix it?

Also, is is there some way to globally reduce the timeout on client
and/or server to reduce the time before the local application aborts?


Best,

   -Nikolaus

-- 
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C


Reply to: