Bug#590935: NFS client cannot access a share when the TCP connection status is TIME_WAIT
Package: nfs-common
Version: 1:1.1.2-6lenny2
After 5 minutes of inactivity of an nfs share, the status of the TCP connection between the client port (779 in the transcript below) and the NFS server port (2049) switches from "ESTABLISHED" to "TIME_WAIT" which is totally normal. Then, according to the default timeout value for the TIME_WAIT state, the connection remains in this state for one minute (60 seconds, which is twice the value of the MSL). If during this minute, another attempt to access the same NFS share is performed, an Input/output error is generated. After a minute the connection occurs normally with the same client port number (779 in the transcript below). Below is a transcript:
########BEGINNING OF TRANSCRIPT ###############
# netstat -na | grep 10.0.0.1 ; date
Fri Jul 30 10:22:18 CEST 2010
# mount -t nfs
10.0.0.1:/export/test on /share/test type nfs (rw,intr,rsize=8192,wsize=8192,addr=10.0.0.1)
# netstat -na | grep 10.0.0.1
# date; time ls /share/test/ ; date
Fri Jul 30 10:22:55 CEST 2010
testfile
real 0m0.003s
user 0m0.000s
sys 0m0.000s
Fri Jul 30 10:22:55 CEST 2010
# netstat -na | grep 10.0.0.1 ; date
tcp 0 0 10.0.0.2:779 10.0.0.1:2049 ESTABLISHED
Fri Jul 30 10:23:58 CEST 2010
# netstat -na | grep 10.0.0.1 ; date
tcp 0 0 10.0.0.2:779 10.0.0.1:2049 TIME_WAIT
Fri Jul 30 10:28:08 CEST 2010
# date; time ls /share/test/ ; date
Fri Jul 30 10:28:16 CEST 2010
ls: cannot access /share/test/: Input/output error
real 0m0.186s
user 0m0.000s
sys 0m0.056s
Fri Jul 30 10:28:16 CEST 2010
# netstat -na | grep 10.0.0.1 ; date
tcp 0 0 10.0.0.2:779 10.0.0.1:2049 TIME_WAIT
Fri Jul 30 10:28:23 CEST 2010
# netstat -na | grep 10.0.0.1 ; date
Fri Jul 30 10:29:15 CEST 2010
# date; time ls /share/test/ ; date
Fri Jul 30 10:29:19 CEST 2010
testfile
real 0m0.003s
user 0m0.000s
sys 0m0.000s
Fri Jul 30 10:29:19 CEST 2010
# netstat -na | grep 10.0.0.1 ; date
tcp 0 0 10.0.0.2:779 10.0.0.1:2049 ESTABLISHED
Fri Jul 30 10:29:22 CEST 2010
#
############END OF TRANSCRIPT ###############
I am using Debian Lenny 2.6.26-2-amd64 #1 SMP Sun Jun 20 20:16:30 UTC 2010 x86_64 GNU/Linux.
It should be noted that on other system/version (Last updates of Redhat 5.5, Ubuntu 10.04, Debian Squeeze/Sid), the behavior is slightly different: When the connection is reinstated during the "TIME_WAIT minute", another port number (the client port number minus one) is used and the NFS share can be accessed without error.
Sincerely,
Jean-Francois C. Weber
Linux System Engineer
Phone: +33 1 70 44 04 17
jeanfrancois.weber@sfr.com
6 rue Nieuport
78140 Velizy-Villacoublay, France
www.sfrbusinessteam.fr
Reply to: