Bug#620299: nfs-kernel-server: nfs stop working after some time - rpc.mountd and rpciod in D state
Package: nfs-kernel-server
Version: 1:1.2.2-4
Severity: important
Tags: squeeze
Hi,
I've got a strange behavior with my nfs server since I use nfsv4.
I'm using nfsv3 and nfsv4 with libpam-ldap, libnss-ldap, and unscd.
The server, after some times (it can be few days or some weeks), stop responding, and the clients cannot mount shares anymore.
Once the server has hung, the first rpciod process is in D state, as the rpc.mountd process.
I can kill -9 the rpc.mountd process and restart it, but it get back in D state as soon as a mount request occurs.
I'm unable to get nfs working again, unless rebooting the server.
Here is my configuration. 'nas' and 'dev' are two nfs servers, mounted on each other.
The problems occurs on the server 'dev'.
root@dev:/# cat /etc/fstab
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc defaults 0 0
cgroup /cgroup cgroup defaults 0 0
/dev/sda3 /boot ext2 defaults 0 2
/dev/mapper/dev-swap none swap sw 0 0
/dev/mapper/dev-root / ext4 errors=remount-ro 0 1
/dev/mapper/dev-home /home ext4 defaults 0 2
/dev/mapper/dev-tmp /tmp ext4 defaults 0 2
/dev/mapper/dev-usr /usr ext4 defaults 0 2
/dev/mapper/dev-var /var ext4 defaults 0 2
/dev/mapper/dev-log /var/log ext4 defaults 0 2
/dev/mapper/dev-dns /var/dns-dev ext4 defaults 0 2
/dev/mapper/dev-dns /dns-dev ext4 defaults 0 2
nas:/dns-nas /dns-nas nfs4 _netdev,rw,soft,tcp 0 0
nas:/sav /var/sav nfs4 _netdev,rw,soft 0 0
root@dev:/# cat /etc/exports
# nfsv3
/dns-dev 192.168.151.0/24(rw,sync,subtree_check,no_root_squash)
# nfsv4
/var/ 192.168.151.0/24(fsid=0,rw,sync,crossmnt,no_subtree_check)
/var/dns-dev 192.168.151.0/24(rw,sync,crossmnt,no_subtree_check,no_root_squash)
root@dev:/# showmount -a
rpc mount dump: RPC: Timed out
root@dev:/# showmount -e
rpc mount dump: RPC: Timed out
root@dev:/# rpcinfo -p
program no_version protocole no_port
100000 2 tcp 111 portmapper
100024 1 udp 47486 status
100024 1 tcp 33830 status
100000 2 udp 111 portmapper
100021 1 udp 49373 nlockmgr
100021 3 udp 49373 nlockmgr
100021 4 udp 49373 nlockmgr
100021 1 tcp 36200 nlockmgr
100021 3 tcp 36200 nlockmgr
100021 4 tcp 36200 nlockmgr
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100005 1 udp 41885 mountd
100005 1 tcp 54023 mountd
100005 2 udp 41885 mountd
100005 2 tcp 54023 mountd
100005 3 udp 41885 mountd
100005 3 tcp 54023 mountd
root@dev:/# cat /proc/fs/nfsd/exports
# Version 1.1
# Path Client(Flags) # IPs
root@nas:/# cat /etc/fstab
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc defaults 0 0
/dev/hda2 none swap sw 0 0
/dev/hda1 / ext3 errors=remount-ro 0 1
/dev/mapper/nas-tmp /tmp ext3 defaults 0 2
/dev/mapper/nas-usr /usr ext3 defaults 0 2
/dev/mapper/nas-var /var ext3 defaults 0 2
/dev/mapper/nas-cache /var/cache ext3 defaults 0 2
/dev/mapper/nas-log /var/log ext3 defaults 0 2
/dev/mapper/nas-dns /dns ext4 defaults 0 2
/dev/mapper/nas-dns /dns-nas ext4 defaults 0 2
/dev/mapper/nas-dns /var/dns-nas ext4 defaults 0 2
/dev/mapper/nas-sav /var/sav ext4 defaults 0 2
dev:/dns-dev /dns-dev nfs4 _netdev,rw,soft 0 0
root@nas:/# cat /etc/exports
# nfsv3
/dns 192.168.151.0/24(rw,sync,subtree_check,no_root_squash)
/dns-nas 192.168.151.0/24(rw,sync,subtree_check,no_root_squash)
/dns/net/internethic/ 192.168.151.0/24(rw,sync,subtree_check,no_root_squash)
# /var/sav/ 192.168.151.0/24(rw,sync,subtree_check,no_root_squash)
# nfsv4
/var/ 192.168.151.0/24(fsid=0,rw,sync,crossmnt,no_subtree_check)
/var/dns-nas 192.168.151.0/24(rw,sync,crossmnt,no_subtree_check)
/var/sav 192.168.151.0/24(rw,sync,crossmnt,no_subtree_check)
root@nas:/# rpcinfo -p
program no_version protocole no_port
100000 2 tcp 111 portmapper
391002 2 tcp 946 sgi_fam
100000 2 udp 111 portmapper
100021 1 udp 32768 nlockmgr
100021 3 udp 32768 nlockmgr
100021 4 udp 32768 nlockmgr
100021 1 tcp 32768 nlockmgr
100021 3 tcp 32768 nlockmgr
100021 4 tcp 32768 nlockmgr
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100005 1 udp 48417 mountd
100005 1 tcp 54133 mountd
100005 2 udp 48417 mountd
100005 2 tcp 54133 mountd
100005 3 udp 48417 mountd
100005 3 tcp 54133 mountd
100024 1 udp 38083 status
100024 1 tcp 58070 status
nas:/home/aya# cat /proc/fs/nfs/exports
# Version 1.1
# Path Client(Flags) # IPs
/var 192.168.151.0/24(rw,root_squash,sync,wdelay,crossmnt,no_subtree_check,fsid=0,uuid=2f751ba1:c12849cd:8273c407:429f825e)
/dns 192.168.151.0/24(rw,no_root_squash,sync,wdelay,uuid=fe5c0bb9:cafd4765:887a7f94:0a6045bc)
Before rpc.mountd hangs, I get the following :
root@nas:/# showmount -e dev
Export list for dev:
/var/dns-dev 192.168.151.0/24
/var 192.168.151.0/24
/dns-dev 192.168.151.0/24
After rpc.mount hangs, I get :
root@nas:/# showmount -e dev
rpc mount export: RPC: Timed out
In the same time, rpc.mountd says, before hanging :
root@dev:/home/aya# rpc.mountd -F -d all
mountd: from_local: updating local if addr list
mountd: from_local: checked 17 local if addrs; incoming address not found
mountd: check_default: access by 192.168.151.10 ALLOWED
mountd: export request from 192.168.151.10.
and while hanging (because of a mount request) :
mountd: auth_unix_ip: inbuf 'nfsd 192.168.151.7'
mountd: auth_unix_ip: client 0xe28e90 '192.168.151.0/24'
mountd: nfsd_fh: inbuf '192.168.151.0/24 1 \x00000000'
I get a strace of the rcp.mountd before hanging :
root@dev:/# strace -e trace=read,write -s4096 -p`pidof rpc.mountd`
Process 32261 attached - interrupt to quit
read(0, "\200\0\0DL\330\341\273\0\0\0\0\0\0\0\2\0\1\206\245\0\0\0\3\0\0\0\5\0\0\0\1\0\0\0\34M\224` \0\0\0\3nas\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\
0\0", 4000) = 72
write(2, "mountd: ", 8) = 8
write(2, "from_local: updating local if addr list", 39) = 39
write(2, "\n", 1) = 1
write(2, "mountd: ", 8) = 8
write(2, "from_local: checked 17 local if addrs; incoming address not found", 65) = 65
write(2, "\n", 1) = 1
read(9, "# /etc/hosts.allow: list of hosts that are allowed to access the system.\n# See the manual pages hosts_access(5) and hosts_options(5).\n#\n# Example: ALL: LOCAL @some_netgroup\n# ALL: .foobar.edu EXCEPT terminalserver.foobar.edu\n#\n# If you're going to protect the portmapper use the name \"portmap\" for the\n# daemon name. Remember that you can only use the keyword \"ALL\" and IP\n# addresses (NOT host or domain names) for the portmapper, as well as for\n# rpc.mountd (the NFS mount daemon). See portmap(8) and rpc.mountd(8)\n# for further information.\n#\n\n", 4096) = 580
read(9, "", 4096) = 0
read(9, "# /etc/hosts.deny: list of hosts that are _not_ allowed to access the system.\n# See the manual pages hosts_access(5) and hosts_options(5).\n#\n# Example: ALL: some.host.name, .some.domain\n# ALL EXCEPT in.fingerd: other.host.name, .other.domain\n#\n# If you're going to protect the portmapper use the name \"portmap\" for the\n# daemon name. Remember that you can only use the keyword \"ALL\" and IP\n# addresses (NOT host or domain names) for the portmapper, as well as for\n# rpc.mountd (the NFS mount daemon). See portmap(8) and rpc.mountd(8)\n# for further information.\n#\n# The PARANOID wildcard matches any host whose name does not match its\n# address.\n#\n# You may wish to enable this to ensure any programs that don't\n# validate looked up hostnames still leave understandable logs. In past\n# versions of Debian this has been the default.\n# ALL: PARANOID\n\n", 4096) = 880
read(9, "", 4096) = 0
write(2, "mountd: ", 8) = 8
write(2, "check_default: access by 192.168.151.10 ALLOWED", 47) = 47
write(2, "\n", 1) = 1
write(2, "mountd: ", 8) = 8
write(2, "export request from 192.168.151.10.", 35) = 35
write(2, "\n", 1) = 1
write(0, "\200\0\0\240L\330\341\273\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\f/var/dns-dev\0\0\0\1\0\0\0\020192.168.151.0/24\0\0\0\0\0\0\0\1\0\0\0\4/var\0\0\0\1\0\0\0\020192.168.151.0/24\0\0\0\0\0\0\0\1\0\0\0\10/dns-dev\0\0\0\1\0\0\0\020192.168.151.0/24\0\0\0\0\0\0\0\0", 164) = 164
read(0, "", 4000) = 0
and while hanging :
read(3, "nfsd 192.168.151.7\n", 2048) = 19
write(2, "mountd: ", 8) = 8
write(2, "auth_unix_ip: inbuf 'nfsd 192.168.151.7'", 40) = 40
write(2, "\n", 1) = 1
write(3, "nfsd 192.168.151.7 1301571202 192.168.151.0/24 \n", 48) = 48
write(2, "mountd: ", 8) = 8
write(2, "auth_unix_ip: client 0xe28e90 '192.168.151.0/24'", 48) = 48
write(2, "\n", 1) = 1
read(5, "192.168.151.0/24 1 \\x00000000\n", 2048) = 30
write(2, "mountd: ", 8) = 8
write(2, "nfsd_fh: inbuf '192.168.151.0/24 1 \\x00000000'", 46) = 46
write(2, "\n", 1) = 1
read(0, "/dev/mapper/dev-root / ext4 rw,errors=remount-ro 0 0\ntmpfs /lib/init/rw tmpfs rw,nosuid,mode=0755 0 0\nproc /proc proc rw,noexec,nosuid,nodev 0 0\nsysfs /sys sysfs rw,noexec,nosuid,nodev 0 0\nudev /dev tmpfs rw,mode=0755 0 0\ntmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0\ndevpts /dev/pts devpts rw,noexec,nosuid,gid=5,mode=620 0 0\ncgroup /cgroup cgroup rw 0 0\n/dev/sda3 /boot ext2 rw 0 0\n/dev/mapper/dev-home /home ext4 rw 0 0\n/dev/mapper/dev-tmp /tmp ext4 rw 0 0\n/dev/mapper/dev-usr /usr ext4 rw 0 0\n/dev/mapper/dev-var /var ext4 rw 0 0\n/dev/mapper/dev-log /var/log ext4 rw 0 0\n/dev/mapper/dev-dns /var/dns-dev ext4 rw 0 0\n/dev/mapper/dev-dns /dns-dev ext4 rw 0 0\nnas:/sav /var/sav nfs4 rw,soft,addr=192.168.151.10,clientaddr=192.168.151.9 0 0\nnfsd /proc/fs/nfsd nfsd rw 0 0\nrpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0\n", 4096) = 820
After the hang, I can see rpciod and rpc.mountd in D state :
root@dev:/# ps aux |awk '$8 ~ /D/'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1672 0.0 0.0 0 0 ? D Mar09 0:09 [rpciod/0]
root 32261 0.0 0.0 14628 1176 pts/17 D+ 13:02 0:00 rpc.mountd -F -d all
Here is the debug in my syslog, during the mount request from host 192.168.151.7 :
root@dev:/# echo 32767 > /proc/sys/sunrpc/rpc_debug
root@dev:/# echo 32767 > /proc/sys/sunrpc/nfs_debug
root@dev:/# echo 32767 > /proc/sys/sunrpc/nfsd_debug
root@dev:/# tail -f /var/log/syslog
Mar 31 13:34:20 dev kernel: [1941071.365643] RPC: killing all tasks for client ffff8803efe08a00
Mar 31 13:34:21 dev kernel: [1941072.363938] RPC: killing all tasks for client ffff8803efe08a00
Mar 31 13:34:22 dev kernel: [1941073.362216] RPC: killing all tasks for client ffff8803efe08a00
Mar 31 13:34:22 dev kernel: [1941073.781870] svc: socket ffff88000dc46e80 TCP data ready (svsk ffff88042e724000)
Mar 31 13:34:22 dev kernel: [1941073.781877] svc: transport ffff88042e724000 served by daemon ffff880418a00000
Mar 31 13:34:22 dev kernel: [1941073.781896] svc: socket ffff88000dc46e80 TCP (connected) state change 8 (svsk ffff88042e724000)
Mar 31 13:34:22 dev kernel: [1941073.781900] svc: transport ffff88042e724000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.781902] svc: socket ffff88000dc46e80 TCP data ready (svsk ffff88042e724000)
Mar 31 13:34:22 dev kernel: [1941073.781905] svc: transport ffff88042e724000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.781920] svc_recv: found XPT_CLOSE
Mar 31 13:34:22 dev kernel: [1941073.781923] svc: svc_delete_xprt(ffff88042e724000)
Mar 31 13:34:22 dev kernel: [1941073.781925] svc: svc_tcp_sock_detach(ffff88042e724000)
Mar 31 13:34:22 dev kernel: [1941073.781927] svc: svc_sock_detach(ffff88042e724000)
Mar 31 13:34:22 dev kernel: [1941073.781937] svc: transport ffff88042e724000 is dead, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.781940] svc: server ffff880418a00000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.782184] svc: socket ffff88042e00d480 TCP (listen) state change 10
Mar 31 13:34:22 dev kernel: [1941073.782188] svc: transport ffff8803a2ece000 served by daemon ffff880418a00000
Mar 31 13:34:22 dev kernel: [1941073.782202] svc: socket ffff88000dc41a00 TCP (listen) state change 1
Mar 31 13:34:22 dev kernel: [1941073.782220] svc: socket ffff88000dc41a00 TCP (listen) state change 1
Mar 31 13:34:22 dev kernel: [1941073.782226] svc: socket ffff88000dc41a00 TCP (listen) state change 1
Mar 31 13:34:22 dev kernel: [1941073.782242] svc: tcp_accept ffff8803a2ece000 sock ffff88042ee298c0
Mar 31 13:34:22 dev kernel: [1941073.782248] nfsd: connect from 192.168.151.7, port=811
Mar 31 13:34:22 dev kernel: [1941073.782250] svc: svc_setup_socket ffff88042bd24d00
Mar 31 13:34:22 dev kernel: [1941073.782253] setting up TCP socket for reading
Mar 31 13:34:22 dev kernel: [1941073.782256] svc: socket ffff88042e723000(inet ffff88000dc41a00), write_space busy=1
Mar 31 13:34:22 dev kernel: [1941073.782259] svc: svc_setup_socket created ffff88042e723000 (inet ffff88000dc41a00)
Mar 31 13:34:22 dev kernel: [1941073.782262] svc: transport ffff88042e723000 served by daemon ffff88040fbc6000
Mar 31 13:34:22 dev kernel: [1941073.782267] svc: transport ffff8803a2ece000 served by daemon ffff88040fbc4000
Mar 31 13:34:22 dev kernel: [1941073.782273] svc: transport ffff8803a2ece000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782276] svc: server ffff880418a00000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.782281] svc: server ffff88040fbc6000, pool 0, transport ffff88042e723000, inuse=2
Mar 31 13:34:22 dev kernel: [1941073.782286] svc: tcp_recv ffff88042e723000 data 1 conn 0 close 0
Mar 31 13:34:22 dev kernel: [1941073.782289] svc: socket ffff88042e723000(inet ffff88000dc41a00), write_space busy=1
Mar 31 13:34:22 dev kernel: [1941073.782293] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782300] svc: socket ffff88042e723000 recvfrom(ffff88042e72328c, 0) = 4
Mar 31 13:34:22 dev kernel: [1941073.782303] svc: TCP record, 100 bytes
Mar 31 13:34:22 dev kernel: [1941073.782312] svc: socket ffff88000dc41a00 TCP data ready (svsk ffff88042e723000)
Mar 31 13:34:22 dev kernel: [1941073.782316] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782322] svc: socket ffff88042e723000 recvfrom(ffff880351f9c008, 4088) = 8
Mar 31 13:34:22 dev kernel: [1941073.782325] svc: tcp_accept ffff8803a2ece000 sock ffff88042ee298c0
Mar 31 13:34:22 dev kernel: [1941073.782332] svc: socket ffff88042e723000 recvfrom(ffff880351f9c064, 3996) = 92
Mar 31 13:34:22 dev kernel: [1941073.782336] svc: server ffff88040fbc4000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.782339] svc: TCP complete record (100 bytes)
Mar 31 13:34:22 dev kernel: [1941073.782343] svc: transport ffff88042e723000 served by daemon ffff88040fbc4000
Mar 31 13:34:22 dev kernel: [1941073.782353] svc: got len=100
Mar 31 13:34:22 dev kernel: [1941073.782356] svc: svc_authenticate (1)
Mar 31 13:34:22 dev kernel: [1941073.782360] svc: server ffff88040fbc4000, pool 0, transport ffff88042e723000, inuse=3
Mar 31 13:34:22 dev kernel: [1941073.782366] RPC: Want update, refage=1800, age=1860
Mar 31 13:34:22 dev kernel: [1941073.782370] svc: tcp_recv ffff88042e723000 data 1 conn 0 close 0
Mar 31 13:34:22 dev kernel: [1941073.782375] svc: socket ffff88042e723000 recvfrom(ffff88042e72328c, 0) = 4
Mar 31 13:34:22 dev kernel: [1941073.782378] svc: TCP record, 100 bytes
Mar 31 13:34:22 dev kernel: [1941073.782383] svc: socket ffff88042e723000 recvfrom(ffff88041ea23008, 4088) = 8
Mar 31 13:34:22 dev kernel: [1941073.782389] svc: socket ffff88042e723000 recvfrom(ffff88041ea23064, 3996) = 92
Mar 31 13:34:22 dev kernel: [1941073.782392] svc: svc_process dropit
Mar 31 13:34:22 dev kernel: [1941073.782394] svc: xprt ffff88042e723000 dropped request
Mar 31 13:34:22 dev kernel: [1941073.782397] svc: TCP complete record (100 bytes)
Mar 31 13:34:22 dev kernel: [1941073.782399] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782403] svc: server ffff88040fbc6000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.782407] svc: transnee
Mar 31 13:34:22 dev kernel: 4[917.869 v:scsc_reff802720)<>1403723]sc evrff801a00,po ,tasotff802730,iue5<>1403723]s:tprc ff84e200dt on0coe0<>1403724]sc oktff802730 evrmff802738,0
Mar 31 13:34:22 dev kernel: 4[917.865 v:TPrcr,10bts<>1403724]sc oktff802730 evrmff80ba50,48)=8<>1403725]rvstcnee
Mar 31 13:34:22 dev kernel: 4[917.864 eii acld<>1407.825]rvstcneld<>1407.25]rvstcnee
Mar 31 13:34:22 dev kernel: 4[917.869 eii anee
Mar 31 13:34:22 dev kernel: 4[917.861 eii cnee
Mar 31 13:34:22 dev kernel: [1941073.782663] svc: svc_sock_free(ffff88042e727c00)
Mar 31 13:34:22 dev kernel: [1941073.782675] svc: socket ffff88042e723000 recvfrom(ffff8802b0ae5064, 3996) = 92
Mar 31 13:34:22 dev kernel: [1941073.782678] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782680] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782681] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782683] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782685] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782687] svc: TCP complete record (100 bytes)
Mar 31 13:34:22 dev kernel: [1941073.782690] svc: transport ffff88042e723000 served by daemon ffff88040fbc4000
Mar 31 13:34:22 dev kernel: [1941073.782693] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782694] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782697] svc: svc_sock_free(ffff88042e720800)
Mar 31 13:34:22 dev kernel: [1941073.782702] svc: got len=100
Mar 31 13:34:22 dev kernel: [1941073.782705] svc: svc_authenticate (1)
Mar 31 13:34:22 dev kernel: [1941073.782707] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782709] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782711] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782713] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782715] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782717] svc: calling dispatcher
Mar 31 13:34:22 dev kernel: [1941073.782720] nfsd_dispatch: vers 4 proc 1
Mar 31 13:34:22 dev kernel: [1941073.782722] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782725] nfsv4 compound op #1/1: 30 (OP_RENEW)
Mar 31 13:34:22 dev kernel: [1941073.782727] process_renew(4d9458d1/0000073f): starting
Mar 31 13:34:22 dev kernel: [1941073.782730] nfsd4_renew: clientid not found!
Mar 31 13:34:22 dev kernel: [1941073.782733] nfsv4 compound op ffff88025b2e0078 opcnt 1 #1: 30: status 10011
Mar 31 13:34:22 dev kernel: [1941073.782735] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782737] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782739] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782741] nfsv4 compound returned 10011
Mar 31 13:34:22 dev kernel: [1941073.782744] svc: svc_sock_free(ffff88042e724000)
Mar 31 13:34:22 dev kernel: [1941073.782755] svc: socket ffff88042e723000 sendto([ffff8803aa615000 48... ], 48) = 48 (addr 192.168.151.7, port=811)
Mar 31 13:34:22 dev kernel: [1941073.782758] revisit queued
Mar 31 13:34:22 dev kernel: [1941073.782760] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782762] revisit queued
Mar 31 13:34:22 dev kernel: [1941073.782765] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782767] revisit queued
Mar 31 13:34:22 dev kernel: [1941073.782769] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782773] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782777] svc: server ffff880418a00000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.782786] svc: server ffff88040fbc4000, pool 0, transport ffff88042e723000, inuse=2
Mar 31 13:34:22 dev kernel: [1941073.782790] svc: transport ffff88042e723000 served by daemon ffff880418a00000
Mar 31 13:34:22 dev kernel: [1941073.782796] svc: got len=100
Mar 31 13:34:22 dev kernel: [1941073.782817] svc: svc_authenticate (1)
Mar 31 13:34:22 dev kernel: [1941073.782821] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782823] svc: calling dispatcher
Mar 31 13:34:22 dev kernel: [1941073.782825] nfsd_dispatch: vers 4 proc 1
Mar 31 13:34:22 dev kernel: [1941073.782828] nfsv4 compound op #1/1: 30 (OP_RENEW)
Mar 31 13:34:22 dev kernel: [1941073.782831] process_renew(4d9458d1/0000073f): starting
Mar 31 13:34:22 dev kernel: [1941073.782832] nfsd4_renew: clientid not found!
Mar 31 13:34:22 dev kernel: [1941073.782835] nfsv4 compound op ffff8802afe5f078 opcnt 1 #1: 30: status 10011
Mar 31 13:34:22 dev kernel: [1941073.782838] nfsv4 compound returned 10011
Mar 31 13:34:22 dev kernel: [1941073.782857] svc: socket ffff88042e723000 sendto([ffff88041ea23000 48... ], 48) = 48 (addr 192.168.151.7, port=811)
Mar 31 13:34:22 dev kernel: [1941073.782860] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782863] svc: server ffff88040fbc4000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.782871] svc: server ffff880418a00000, pool 0, transport ffff88042e723000, inuse=2
Mar 31 13:34:22 dev kernel: [1941073.782874] svc: transport ffff88042e723000 served by daemon ffff88040fbc4000
Mar 31 13:34:22 dev kernel: [1941073.782878] svc: got len=100
Mar 31 13:34:22 dev kernel: [1941073.782880] svc: svc_authenticate (1)
Mar 31 13:34:22 dev kernel: [1941073.782885] svc: server ffff88040fbc4000, pool 0, transport ffff88042e723000, inuse=3
Mar 31 13:34:22 dev kernel: [1941073.782889] svc: transport ffff88042e723000 served by daemon ffff88040fbc6000
Mar 31 13:34:22 dev kernel: [1941073.782895] svc: got len=100
Mar 31 13:34:22 dev kernel: [1941073.782897] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782900] svc: calling dispatcher
Mar 31 13:34:22 dev kernel: [1941073.782902] nfsd_dispatch: vers 4 proc 1
Mar 31 13:34:22 dev kernel: <4n eund101<>1403737]sc o e=0
Mar 31 13:34:22 dev kernel: 4[917.803 v:scatetct 1
Mar 31 13:34:22 dev kernel: 4[917.808 v:tasotff802730 uy o nuud<>1403738]sc aln ipthr<>1403738]ns_ipth es4po
Mar 31 13:34:22 dev kernel: 4[917.807 v:sce ff84e200snt(ff80ffd0 8. ,4)=4 ad 9.6.5.,pr=1)<>1403739]ns4cmon (o=900)<>1403730]ns4rnw letdntfud
Mar 31 13:34:22 dev kernel: 4[917.818 fv opudo ff82f508oct1#:3:sau 01
Mar 31 13:34:22 dev kernel: 4[917.812 v:sre ff848000 ol0 rnpr ff84e200 ns=
Mar 31 13:34:22 dev kernel: 4[917.816 v:tprc ff84e200dt on0coe0<>1403732]ns4cmon eund101<>1403732]sc oktff802730 evrmff802738,0
Mar 31 13:34:22 dev kernel: 4[917.816 v:TPrcr,10bts<>1403733]sc oktff802730 evrmff807280,48)=8<>1403734]sc oktff802730 edo[ff8489004..] 8 8(dr1218117 ot81
Mar 31 13:34:22 dev kernel: 4[917.815 v:tasotff802730 uy o nuud<>1403735]sc oktff802730 evrmff807286,39)=9
Mar 31 13:34:22 dev kernel: 4[917.813 v:TPcmlt eod(0 ye)<>1403735]sc rnpr ff84e200sre ydeo ff84fc00<>1403735]sc evrff800b40 atn o aa(o=900)<>1403736]sc o e=0
Mar 31 13:34:22 dev kernel: 4[917.818 v:scatetct 1
Mar 31 13:34:22 dev kernel: 4[917.812 v:tasotff802730 uy o nuud<>1403737]sc aln dispatcher
Mar 31 13:34:22 dev kernel: [1941073.783177] nfsd_dispatch: vers 4 proc 1
Mar 31 13:34:22 dev kernel: [1941073.783181] nfsv4 compound op #1/1: 30 (OP_RENEW)
Mar 31 13:34:22 dev kernel: [1941073.783183] process_renew(4d9458d1/0000073f): starting
Mar 31 13:34:22 dev kernel: [1941073.783195] nfsd4_renew: clientid not found!
Mar 31 13:34:22 dev kernel: [1941073.783198] nfsv4 compound op ffff88025b2e0078 opcnt 1 #1: 30: status 10011
Mar 31 13:34:22 dev kernel: [1941073.783201] nfsv4 compound returned 10011
Mar 31 13:34:22 dev kernel: [1941073.783211] svc: socket ffff88042e723000 sendto([ffff8804132e7000 48... ], 48) = 48 (addr 192.168.151.7, port=811)
Mar 31 13:34:22 dev kernel: [1941073.783215] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.783217] svc: server ffff880418a00000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.783225] svc: server ffff88040fbc6000, pool 0, transport ffff88042e723000, inuse=2
Mar 31 13:34:22 dev kernel: [1941073.783228] svc: tcp_recv ffff88042e723000 data 1 conn 0 close 0
Mar 31 13:34:22 dev kernel: [1941073.783232] svc: socket ffff88042e723000 recvfrom(ffff88042e72328c, 0) = 4
Mar 31 13:34:22 dev kernel: [1941073.783234] svc: TCP record, 100 bytes
Mar 31 13:34:22 dev kernel: [1941073.783237] svc: socket ffff88042e723000 recvfrom(ffff880351f9c008, 4088) = 8
Mar 31 13:34:22 dev kernel: [1941073.783241] svc: socket ffff88042e723000 recvfrom(ffff880351f9c064, 3996) = 92
Mar 31 13:34:22 dev kernel: [1941073.783244] svc: TCP complete record (100 bytes)
Mar 31 13:34:22 dev kernel: [1941073.783246] svc: transport ffff88042e723000 served by daemon ffff880418a00000
Mar 31 13:34:22 dev kernel: [1941073.783251] svc: got len=100
Mar 31 13:34:22 dev kernel: [1941073.783253] svc: svc_authenticate (1)
Mar 31 13:34:22 dev kernel: [1941073.783258] svc: server ffff880418a00000, pool 0, transport ffff88042e723000, inuse=3
Mar 31 13:34:22 dev kernel: [1941073.783261] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.783265] svc: tcp_recv ffff88042e723000 data 1 conn 0 close 0
Mar 31 13:34:22 dev kernel: [1941073.783270] svc: socket ffff88042e723000 recvfrom(ffff88042e723288, 4) = -11
Mar 31 13:34:22 dev kernel: [1941073.783273] RPC: TCP recv_record got EAGAIN
Mar 31 13:34:22 dev kernel: [1941073.783275] svc: calling dispatcher
Mar 31 13:34:22 dev kernel: [1941073.783278] nfsd_dispatch: vers 4 proc 1
Mar 31 13:34:22 dev kernel: [1941073.783281] svc: got len=-11
Mar 31 13:34:22 dev kernel: [1941073.783283] nfsv4 compound op #1/1: 30 (OP_RENEW)
Mar 31 13:34:22 dev kernel: [1941073.783286] process_renew(4d9458d1/0000073f): starting
Mar 31 13:34:22 dev kernel: [1941073.783288] nfsd4_renew: clientid not found!
Mar 31 13:34:22 dev kernel: [1941073.783292] nfsv4 compound op ffff880377c22078 opcnt 1 #1: 30: status 10011
Mar 31 13:34:22 dev kernel: [1941073.783295] svc: server ffff880418a00000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.783299] nfsv4 compound returned 10011
Mar 31 13:34:22 dev kernel: [1941073.783314] svc: socket ffff88042e723000 sendto([ffff88041ea23000 48... ], 48) = 48 (addr 192.168.151.7, port=811)
Mar 31 13:34:22 dev kernel: [194ffff880351f9c008, 4088) = 8
Mar 31 13:34:22 dev kernel: [1941073.792890] svc: socket ffff88042e723000 recvfrom(ffff880351f9c0bc, 3908) = 180
Mar 31 13:34:22 dev kernel: [1941073.792893] svc: TCP complete record (188 bytes)
Mar 31 13:34:22 dev kernel: [1941073.792896] svc: transport ffff88042e723000 served by daemon ffff880418a00000
Mar 31 13:34:22 dev kernel: [1941073.792900] svc: got len=188
Mar 31 13:34:22 dev kernel: [1941073.792903] svc: svc_authenticate (1)
Mar 31 13:34:22 dev kernel: [1941073.792909] svc: server ffff880418a00000, pool 0, transport ffff88042e723000, inuse=3
Mar 31 13:34:22 dev kernel: [1941073.792912] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.792914] svc: calling dispatcher
Mar 31 13:34:22 dev kernel: [1941073.792917] nfsd_dispatch: vers 4 proc 1
Mar 31 13:34:22 dev kernel: [1941073.792920] svc: tcp_recv ffff88042e723000 data 1 conn 0 close 0
Mar 31 13:34:22 dev kernel: [1941073.792923] nfsv4 compound op #1/1: 35 (OP_SETCLIENTID)
Mar 31 13:34:22 dev kernel: [1941073.792927] NFSD: nfs4_make_rec_clidname for 192.168.151.6/192.168.151.9 tcp UNIX 0
Mar 31 13:34:22 dev kernel: [1941073.792931] svc: socket ffff88042e723000 recvfrom(ffff88042e723288, 4) = -11
Mar 31 13:34:22 dev kernel: [1941073.792934] RPC: TCP recv_record got EAGAIN
Mar 31 13:34:22 dev kernel: [1941073.792936] svc: got len=-11
Mar 31 13:34:22 dev kernel: [1941073.792938] svc: server ffff880418a00000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.792952] nfsv4 compound op ffff880377c22078 opcnt 1 #1: 35: status 0
Mar 31 13:34:22 dev kernel: [1941073.792956] nfsv4 compound returned 0
Mar 31 13:34:22 dev kernel: [1941073.792971] svc: socket ffff88042e723000 sendto([ffff8802b0ae5000 64... ], 64) = 64 (addr 192.168.151.7, port=811)
Mar 31 13:34:22 dev kernel: [1941073.792979] svc: server ffff88040fbc6000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.793055] svc: socket ffff88000dc41a00 TCP data ready (svsk ffff88042e723000)
Mar 31 13:34:22 dev kernel: [1941073.793058] svc: transport ffff88042e723000 served by daemon ffff88040fbc6000
Mar 31 13:34:22 dev kernel: [1941073.793067] svc: server ffff88040fbc6000, pool 0, transport ffff88042e723000, inuse=2
Mar 31 13:34:22 dev kernel: [1941073.793070] svc: tcp_recv ffff88042e723000 data 1 conn 0 close 0
Mar 31 13:34:22 dev kernel: [1941073.793075] svc: socket ffff88042e723000 recvfrom(ffff88042e72328c, 0) = 4
Mar 31 13:34:22 dev kernel: [1941073.793077] svc: TCP record, 128 bytes
Mar 31 13:34:22 dev kernel: [1941073.793080] svc: socket ffff88042e723000 recvfrom(ffff880351f9c008, 4088) = 8
Mar 31 13:34:22 dev kernel: [1941073.793084] svc: socket ffff88042e723000 recvfrom(ffff880351f9c080, 3968) = 120
Mar 31 13:34:22 dev kernel: [1941073.793087] svc: TCP complete record (128 bytes)
Mar 31 13:34:22 dev kernel: [1941073.793089] svc: transport ffff88042e723000 served by daemon ffff880418a00000
Mar 31 13:34:22 dev kernel: [1941073.793093] svc: got len=128
Mar 31 13:34:22 dev kernel: [1941073.793096] svc: svc_authenticate (1)
Mar 31 13:34:22 dev kernel: [1941073.793100] svc: server ffff880418a00000, pool 0, transport ffff88042e723000, inuse=3
Mar 31 13:34:22 dev kernel: [1941073.793103] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.793106] svc: calling dispatcher
Mar 31 13:34:22 dev kernel: [1941073.793108] nfsd_dispatch: vers 4 proc 1
Mar 31 13:34:22 dev kernel: [1941073.793112] svc: tcp_recv ffff88042e723000 data 1 conn 0 close 0
Mar 31 13:34:22 dev kernel: [1941073.793115] nfsv4 compound op #1/3: 36 (OP_SETCLIENTID_CONFIRM)
Mar 31 13:34:22 dev kernel: [1941073.793119] NFSD: move_to_confirm nfs4_client ffff88042e725800
Mar 31 13:34:22 dev kernel: [1941073.793123] svc: socket ffff88042e723000 recvfrom(ffff88042e723288, 4) = -11
Mar 31 13:34:22 dev kernel: [1941073.793125] RPC: TCP recv_record got EAGAIN
Mar 31 13:34:22 dev kernel: [1941073.793128] svc: got len=-11
Mar 31 13:34:22 dev kernel: [1941073.793130] svc: server ffff880418a00000 waiting for data (to = 900000)
Mar 31 13:80<>1403730]RC 21 albn sau )<>1403730]RC 21 alcnetxr ff825e00i o once
Mar 31 13:34:22 dev kernel: 4[917.927 P:312xr_onc ptff80f8a0 sntonce
Mar 31 13:34:22 dev kernel: 4[917.920 P:312sepo(uu xr_edn"tm 71037
Mar 31 13:34:22 dev kernel: 4[917.923 P:312addt uu ff825e20"ptpnig
Mar 31 13:34:22 dev kernel: 4[917.926 P:312stigaamfr600m
Mar 31 13:34:22 dev kernel: 4[917.929 P: x_onc ceue ptff80f8a0
Mar 31 13:34:22 dev kernel: 4[917.923 fv opudo ff837208oct3#:3:sau
Mar 31 13:34:22 dev kernel: 4[917.926 fv opudo 23 4(PPTOTH
Mar 31 13:34:22 dev kernel: 4[917.920 P: Wn pae eae10 g=
Mar 31 13:34:22 dev kernel: 4[917.920 fv opudo ff837208oct3#:2:sau 00
Mar 31 13:34:22 dev kernel: 4[917.923 fv opudrtre 00
Mar 31 13:34:22 dev kernel: 4[917.925 fd rpigrqet a ervstdltr<>1403734]sc v_rcs rpt<>1403735]sc ptff802730 rpe eus
Mar 31 13:34:22 dev kernel: 4[917.924 v:sre ff84fc00wiigfrdt t 000
Mar 31 13:34:22 dev kernel: 4[917.937 F:rvldtn 01/)<>1403734]RC okn pUI rd<>1403734]RC okn pUI rd<>1403734]RC e akiiilzd rci 21
Mar 31 13:34:22 dev kernel: 4[917.938 P: alctdts ff84d3b0<>1403735]RC 21 _p_xct lg=x8
Mar 31 13:34:22 dev kernel: 4[917.933 P:313cl_tr f4po EAT sn)<>1403735]RC 21 alrsre(tts0
Mar 31 13:34:22 dev kernel: 4[917.938 P: wiigfrrqetso
Mar 31 13:34:22 dev kernel: 4[917.930 P:313sepo(uu xr_alg ie48014)<>140.933 P:313addt uu f80279a xr_akg
Mar 31 13:34:22 dev kernel: 4[403736]P:313sn akgi ose
Mar 31 13:34:23 dev kernel: [1941074.360512] RPC: killing all tasks for client ffff8803efe08a00
Mar 31 13:34:24 dev kernel: [1941075.358733] RPC: killing all tasks for client ffff8803efe08a00
Mar 31 13:34:25 dev kernel: [1941076.357001] RPC: killing all tasks for client ffff8803efe08a00
Mar 31 13:34:26 dev kernel: [1941077.355322] RPC: killing all tasks for client ffff8803efe08a00
So, I'm unable to get the nfs working again, unless I reboot the server.
I'm using portmap 6.0.0-2. Shall I use rcpbind instead ?
I'm not able to reproduce the bug, but I know it will come again, it's the third time that I get it.
Thanks for reading.
Best regards,
Yann
-- System Information:
Debian Release: 6.0
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.32-5-amd64 (SMP w/16 CPU cores)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages nfs-kernel-server depends on:
ii libblkid1 2.17.2-9 block device id library
ii libc6 2.11.2-10 Embedded GNU C Library: Shared lib
ii libcomerr2 1.41.12-2 common error description library
ii libgssapi-krb5-2 1.8.3+dfsg-4 MIT Kerberos runtime libraries - k
ii libgssglue1 0.1-4 mechanism-switch gssapi library
ii libk5crypto3 1.8.3+dfsg-4 MIT Kerberos runtime libraries - C
ii libkrb5-3 1.8.3+dfsg-4 MIT Kerberos runtime libraries
ii libnfsidmap2 0.23-2 An nfs idmapping library
ii librpcsecgss3 0.19-2 allows secure rpc communication us
ii libwrap0 7.6.q-19 Wietse Venema's TCP wrappers libra
ii lsb-base 3.2-23.2squeeze1 Linux Standard Base 3.2 init scrip
ii nfs-common 1:1.2.2-4 NFS support files common to client
ii ucf 3.0025+nmu1 Update Configuration File: preserv
nfs-kernel-server recommends no packages.
nfs-kernel-server suggests no packages.
-- no debconf information
Reply to: