[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#620299: nfs-kernel-server: nfs stop working after some time - rpc.mountd and rpciod in D state



Package: nfs-kernel-server
Version: 1:1.2.2-4
Severity: important
Tags: squeeze

Hi,

I've got a strange behavior with my nfs server since I use nfsv4.
I'm using nfsv3 and nfsv4 with libpam-ldap, libnss-ldap, and unscd.
The server, after some times (it can be few days or some weeks), stop responding, and the clients cannot mount shares anymore.

Once the server has hung, the first rpciod process is in D state, as the rpc.mountd process.
I can kill -9 the rpc.mountd process and restart it, but it get back in D state as soon as a mount request occurs.
I'm unable to get nfs working again, unless rebooting the server.

Here is my configuration. 'nas' and 'dev' are two nfs servers, mounted on each other.
The problems occurs on the server 'dev'.

root@dev:/# cat /etc/fstab
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    defaults        0       0
cgroup          /cgroup         cgroup  defaults        0       0
/dev/sda3            /boot          ext2    defaults        0       2
/dev/mapper/dev-swap none           swap    sw              0       0
/dev/mapper/dev-root /              ext4    errors=remount-ro 0     1
/dev/mapper/dev-home /home          ext4    defaults        0       2
/dev/mapper/dev-tmp /tmp            ext4    defaults        0       2
/dev/mapper/dev-usr /usr            ext4    defaults        0       2
/dev/mapper/dev-var /var            ext4    defaults        0       2
/dev/mapper/dev-log /var/log        ext4    defaults        0       2
/dev/mapper/dev-dns /var/dns-dev    ext4    defaults        0       2
/dev/mapper/dev-dns /dns-dev        ext4    defaults        0       2
nas:/dns-nas        /dns-nas        nfs4    _netdev,rw,soft,tcp 0       0
nas:/sav            /var/sav        nfs4    _netdev,rw,soft 0       0

root@dev:/# cat /etc/exports
# nfsv3
/dns-dev        192.168.151.0/24(rw,sync,subtree_check,no_root_squash)
# nfsv4
/var/           192.168.151.0/24(fsid=0,rw,sync,crossmnt,no_subtree_check)
/var/dns-dev    192.168.151.0/24(rw,sync,crossmnt,no_subtree_check,no_root_squash)

root@dev:/# showmount -a
rpc mount dump: RPC: Timed out

root@dev:/# showmount -e
rpc mount dump: RPC: Timed out

root@dev:/# rpcinfo -p
   program no_version protocole  no_port
    100000    2   tcp    111  portmapper
    100024    1   udp  47486  status
    100024    1   tcp  33830  status
    100000    2   udp    111  portmapper
    100021    1   udp  49373  nlockmgr
    100021    3   udp  49373  nlockmgr
    100021    4   udp  49373  nlockmgr
    100021    1   tcp  36200  nlockmgr
    100021    3   tcp  36200  nlockmgr
    100021    4   tcp  36200  nlockmgr
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    4   udp   2049  nfs
    100005    1   udp  41885  mountd
    100005    1   tcp  54023  mountd
    100005    2   udp  41885  mountd
    100005    2   tcp  54023  mountd
    100005    3   udp  41885  mountd
    100005    3   tcp  54023  mountd

root@dev:/# cat /proc/fs/nfsd/exports 
# Version 1.1
# Path Client(Flags) # IPs

root@nas:/# cat /etc/fstab
# <file system>         <mount point>   <type>  <options>       <dump>  <pass>
proc                    /proc           proc    defaults        0       0
/dev/hda2               none            swap    sw              0       0
/dev/hda1               /               ext3    errors=remount-ro 0       1
/dev/mapper/nas-tmp     /tmp            ext3    defaults        0       2
/dev/mapper/nas-usr     /usr            ext3    defaults        0       2
/dev/mapper/nas-var     /var            ext3    defaults        0       2
/dev/mapper/nas-cache   /var/cache      ext3    defaults        0       2
/dev/mapper/nas-log     /var/log        ext3    defaults        0       2
/dev/mapper/nas-dns     /dns            ext4    defaults        0       2
/dev/mapper/nas-dns     /dns-nas        ext4    defaults        0       2
/dev/mapper/nas-dns     /var/dns-nas    ext4    defaults        0       2
/dev/mapper/nas-sav     /var/sav        ext4    defaults        0       2
dev:/dns-dev            /dns-dev        nfs4    _netdev,rw,soft 0       0

root@nas:/# cat /etc/exports
# nfsv3
/dns                    192.168.151.0/24(rw,sync,subtree_check,no_root_squash)
/dns-nas                192.168.151.0/24(rw,sync,subtree_check,no_root_squash)
/dns/net/internethic/   192.168.151.0/24(rw,sync,subtree_check,no_root_squash)
# /var/sav/             192.168.151.0/24(rw,sync,subtree_check,no_root_squash)
# nfsv4
/var/           192.168.151.0/24(fsid=0,rw,sync,crossmnt,no_subtree_check)
/var/dns-nas    192.168.151.0/24(rw,sync,crossmnt,no_subtree_check)
/var/sav        192.168.151.0/24(rw,sync,crossmnt,no_subtree_check)

root@nas:/# rpcinfo -p
   program no_version protocole  no_port
    100000    2   tcp    111  portmapper
    391002    2   tcp    946  sgi_fam
    100000    2   udp    111  portmapper
    100021    1   udp  32768  nlockmgr
    100021    3   udp  32768  nlockmgr
    100021    4   udp  32768  nlockmgr
    100021    1   tcp  32768  nlockmgr
    100021    3   tcp  32768  nlockmgr
    100021    4   tcp  32768  nlockmgr
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    4   udp   2049  nfs
    100005    1   udp  48417  mountd
    100005    1   tcp  54133  mountd
    100005    2   udp  48417  mountd
    100005    2   tcp  54133  mountd
    100005    3   udp  48417  mountd
    100005    3   tcp  54133  mountd
    100024    1   udp  38083  status
    100024    1   tcp  58070  status

nas:/home/aya# cat /proc/fs/nfs/exports 
# Version 1.1
# Path Client(Flags) # IPs
/var    192.168.151.0/24(rw,root_squash,sync,wdelay,crossmnt,no_subtree_check,fsid=0,uuid=2f751ba1:c12849cd:8273c407:429f825e)
/dns    192.168.151.0/24(rw,no_root_squash,sync,wdelay,uuid=fe5c0bb9:cafd4765:887a7f94:0a6045bc)


Before rpc.mountd hangs, I get the following :
root@nas:/# showmount -e dev
Export list for dev:
/var/dns-dev 192.168.151.0/24
/var         192.168.151.0/24
/dns-dev     192.168.151.0/24

After rpc.mount hangs, I get :
root@nas:/# showmount -e dev
rpc mount export: RPC: Timed out

In the same time, rpc.mountd says, before hanging :
root@dev:/home/aya# rpc.mountd -F -d all
mountd: from_local: updating local if addr list
mountd: from_local: checked 17 local if addrs; incoming address not found
mountd: check_default: access by 192.168.151.10 ALLOWED
mountd: export request from 192.168.151.10.

and while hanging (because of a mount request) :
mountd: auth_unix_ip: inbuf 'nfsd 192.168.151.7'
mountd: auth_unix_ip: client 0xe28e90 '192.168.151.0/24'
mountd: nfsd_fh: inbuf '192.168.151.0/24 1 \x00000000'

I get a strace of the rcp.mountd before hanging :
root@dev:/# strace -e trace=read,write -s4096 -p`pidof rpc.mountd`
Process 32261 attached - interrupt to quit
read(0, "\200\0\0DL\330\341\273\0\0\0\0\0\0\0\2\0\1\206\245\0\0\0\3\0\0\0\5\0\0\0\1\0\0\0\34M\224` \0\0\0\3nas\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\
0\0", 4000) = 72
write(2, "mountd: ", 8)                 = 8
write(2, "from_local: updating local if addr list", 39) = 39
write(2, "\n", 1)                       = 1
write(2, "mountd: ", 8)                 = 8
write(2, "from_local: checked 17 local if addrs; incoming address not found", 65) = 65
write(2, "\n", 1)                       = 1
read(9, "# /etc/hosts.allow: list of hosts that are allowed to access the system.\n#                   See the manual pages hosts_access(5) and hosts_options(5).\n#\n# Example:    ALL: LOCAL @some_netgroup\n#             ALL: .foobar.edu EXCEPT terminalserver.foobar.edu\n#\n# If you're going to protect the portmapper use the name \"portmap\" for the\n# daemon name. Remember that you can only use the keyword \"ALL\" and IP\n# addresses (NOT host or domain names) for the portmapper, as well as for\n# rpc.mountd (the NFS mount daemon). See portmap(8) and rpc.mountd(8)\n# for further information.\n#\n\n", 4096) = 580
read(9, "", 4096)                       = 0
read(9, "# /etc/hosts.deny: list of hosts that are _not_ allowed to access the system.\n#                  See the manual pages hosts_access(5) and hosts_options(5).\n#\n# Example:    ALL: some.host.name, .some.domain\n#             ALL EXCEPT in.fingerd: other.host.name, .other.domain\n#\n# If you're going to protect the portmapper use the name \"portmap\" for the\n# daemon name. Remember that you can only use the keyword \"ALL\" and IP\n# addresses (NOT host or domain names) for the portmapper, as well as for\n# rpc.mountd (the NFS mount daemon). See portmap(8) and rpc.mountd(8)\n# for further information.\n#\n# The PARANOID wildcard matches any host whose name does not match its\n# address.\n#\n# You may wish to enable this to ensure any programs that don't\n# validate looked up hostnames still leave understandable logs. In past\n# versions of Debian this has been the default.\n# ALL: PARANOID\n\n", 4096) = 880
read(9, "", 4096)                       = 0
write(2, "mountd: ", 8)                 = 8
write(2, "check_default: access by 192.168.151.10 ALLOWED", 47) = 47
write(2, "\n", 1)                       = 1
write(2, "mountd: ", 8)                 = 8
write(2, "export request from 192.168.151.10.", 35) = 35
write(2, "\n", 1)                       = 1
write(0, "\200\0\0\240L\330\341\273\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\f/var/dns-dev\0\0\0\1\0\0\0\020192.168.151.0/24\0\0\0\0\0\0\0\1\0\0\0\4/var\0\0\0\1\0\0\0\020192.168.151.0/24\0\0\0\0\0\0\0\1\0\0\0\10/dns-dev\0\0\0\1\0\0\0\020192.168.151.0/24\0\0\0\0\0\0\0\0", 164) = 164
read(0, "", 4000)                       = 0

and while hanging :
read(3, "nfsd 192.168.151.7\n", 2048)   = 19
write(2, "mountd: ", 8)                 = 8
write(2, "auth_unix_ip: inbuf 'nfsd 192.168.151.7'", 40) = 40
write(2, "\n", 1)                       = 1
write(3, "nfsd 192.168.151.7 1301571202 192.168.151.0/24 \n", 48) = 48
write(2, "mountd: ", 8)                 = 8
write(2, "auth_unix_ip: client 0xe28e90 '192.168.151.0/24'", 48) = 48
write(2, "\n", 1)                       = 1
read(5, "192.168.151.0/24 1 \\x00000000\n", 2048) = 30
write(2, "mountd: ", 8)                 = 8
write(2, "nfsd_fh: inbuf '192.168.151.0/24 1 \\x00000000'", 46) = 46
write(2, "\n", 1)                       = 1
read(0, "/dev/mapper/dev-root / ext4 rw,errors=remount-ro 0 0\ntmpfs /lib/init/rw tmpfs rw,nosuid,mode=0755 0 0\nproc /proc proc rw,noexec,nosuid,nodev 0 0\nsysfs /sys sysfs rw,noexec,nosuid,nodev 0 0\nudev /dev tmpfs rw,mode=0755 0 0\ntmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0\ndevpts /dev/pts devpts rw,noexec,nosuid,gid=5,mode=620 0 0\ncgroup /cgroup cgroup rw 0 0\n/dev/sda3 /boot ext2 rw 0 0\n/dev/mapper/dev-home /home ext4 rw 0 0\n/dev/mapper/dev-tmp /tmp ext4 rw 0 0\n/dev/mapper/dev-usr /usr ext4 rw 0 0\n/dev/mapper/dev-var /var ext4 rw 0 0\n/dev/mapper/dev-log /var/log ext4 rw 0 0\n/dev/mapper/dev-dns /var/dns-dev ext4 rw 0 0\n/dev/mapper/dev-dns /dns-dev ext4 rw 0 0\nnas:/sav /var/sav nfs4 rw,soft,addr=192.168.151.10,clientaddr=192.168.151.9 0 0\nnfsd /proc/fs/nfsd nfsd rw 0 0\nrpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0\n", 4096) = 820

After the hang, I can see rpciod and rpc.mountd in D state :
root@dev:/# ps aux |awk '$8 ~ /D/'
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      1672  0.0  0.0      0     0 ?        D    Mar09   0:09 [rpciod/0]
root     32261  0.0  0.0  14628  1176 pts/17   D+   13:02   0:00 rpc.mountd -F -d all

Here is the debug in my syslog, during the mount request from host 192.168.151.7 :
root@dev:/# echo 32767 > /proc/sys/sunrpc/rpc_debug
root@dev:/# echo 32767 > /proc/sys/sunrpc/nfs_debug
root@dev:/# echo 32767 > /proc/sys/sunrpc/nfsd_debug
root@dev:/# tail -f /var/log/syslog
Mar 31 13:34:20 dev kernel: [1941071.365643] RPC:       killing all tasks for client ffff8803efe08a00
Mar 31 13:34:21 dev kernel: [1941072.363938] RPC:       killing all tasks for client ffff8803efe08a00
Mar 31 13:34:22 dev kernel: [1941073.362216] RPC:       killing all tasks for client ffff8803efe08a00
Mar 31 13:34:22 dev kernel: [1941073.781870] svc: socket ffff88000dc46e80 TCP data ready (svsk ffff88042e724000)
Mar 31 13:34:22 dev kernel: [1941073.781877] svc: transport ffff88042e724000 served by daemon ffff880418a00000
Mar 31 13:34:22 dev kernel: [1941073.781896] svc: socket ffff88000dc46e80 TCP (connected) state change 8 (svsk ffff88042e724000)
Mar 31 13:34:22 dev kernel: [1941073.781900] svc: transport ffff88042e724000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.781902] svc: socket ffff88000dc46e80 TCP data ready (svsk ffff88042e724000)
Mar 31 13:34:22 dev kernel: [1941073.781905] svc: transport ffff88042e724000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.781920] svc_recv: found XPT_CLOSE
Mar 31 13:34:22 dev kernel: [1941073.781923] svc: svc_delete_xprt(ffff88042e724000)
Mar 31 13:34:22 dev kernel: [1941073.781925] svc: svc_tcp_sock_detach(ffff88042e724000)
Mar 31 13:34:22 dev kernel: [1941073.781927] svc: svc_sock_detach(ffff88042e724000)
Mar 31 13:34:22 dev kernel: [1941073.781937] svc: transport ffff88042e724000 is dead, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.781940] svc: server ffff880418a00000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.782184] svc: socket ffff88042e00d480 TCP (listen) state change 10
Mar 31 13:34:22 dev kernel: [1941073.782188] svc: transport ffff8803a2ece000 served by daemon ffff880418a00000
Mar 31 13:34:22 dev kernel: [1941073.782202] svc: socket ffff88000dc41a00 TCP (listen) state change 1
Mar 31 13:34:22 dev kernel: [1941073.782220] svc: socket ffff88000dc41a00 TCP (listen) state change 1
Mar 31 13:34:22 dev kernel: [1941073.782226] svc: socket ffff88000dc41a00 TCP (listen) state change 1
Mar 31 13:34:22 dev kernel: [1941073.782242] svc: tcp_accept ffff8803a2ece000 sock ffff88042ee298c0
Mar 31 13:34:22 dev kernel: [1941073.782248] nfsd: connect from 192.168.151.7, port=811
Mar 31 13:34:22 dev kernel: [1941073.782250] svc: svc_setup_socket ffff88042bd24d00
Mar 31 13:34:22 dev kernel: [1941073.782253] setting up TCP socket for reading
Mar 31 13:34:22 dev kernel: [1941073.782256] svc: socket ffff88042e723000(inet ffff88000dc41a00), write_space busy=1
Mar 31 13:34:22 dev kernel: [1941073.782259] svc: svc_setup_socket created ffff88042e723000 (inet ffff88000dc41a00)
Mar 31 13:34:22 dev kernel: [1941073.782262] svc: transport ffff88042e723000 served by daemon ffff88040fbc6000
Mar 31 13:34:22 dev kernel: [1941073.782267] svc: transport ffff8803a2ece000 served by daemon ffff88040fbc4000
Mar 31 13:34:22 dev kernel: [1941073.782273] svc: transport ffff8803a2ece000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782276] svc: server ffff880418a00000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.782281] svc: server ffff88040fbc6000, pool 0, transport ffff88042e723000, inuse=2
Mar 31 13:34:22 dev kernel: [1941073.782286] svc: tcp_recv ffff88042e723000 data 1 conn 0 close 0
Mar 31 13:34:22 dev kernel: [1941073.782289] svc: socket ffff88042e723000(inet ffff88000dc41a00), write_space busy=1
Mar 31 13:34:22 dev kernel: [1941073.782293] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782300] svc: socket ffff88042e723000 recvfrom(ffff88042e72328c, 0) = 4
Mar 31 13:34:22 dev kernel: [1941073.782303] svc: TCP record, 100 bytes
Mar 31 13:34:22 dev kernel: [1941073.782312] svc: socket ffff88000dc41a00 TCP data ready (svsk ffff88042e723000)
Mar 31 13:34:22 dev kernel: [1941073.782316] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782322] svc: socket ffff88042e723000 recvfrom(ffff880351f9c008, 4088) = 8
Mar 31 13:34:22 dev kernel: [1941073.782325] svc: tcp_accept ffff8803a2ece000 sock ffff88042ee298c0
Mar 31 13:34:22 dev kernel: [1941073.782332] svc: socket ffff88042e723000 recvfrom(ffff880351f9c064, 3996) = 92
Mar 31 13:34:22 dev kernel: [1941073.782336] svc: server ffff88040fbc4000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.782339] svc: TCP complete record (100 bytes)
Mar 31 13:34:22 dev kernel: [1941073.782343] svc: transport ffff88042e723000 served by daemon ffff88040fbc4000
Mar 31 13:34:22 dev kernel: [1941073.782353] svc: got len=100
Mar 31 13:34:22 dev kernel: [1941073.782356] svc: svc_authenticate (1)
Mar 31 13:34:22 dev kernel: [1941073.782360] svc: server ffff88040fbc4000, pool 0, transport ffff88042e723000, inuse=3
Mar 31 13:34:22 dev kernel: [1941073.782366] RPC:       Want update, refage=1800, age=1860
Mar 31 13:34:22 dev kernel: [1941073.782370] svc: tcp_recv ffff88042e723000 data 1 conn 0 close 0
Mar 31 13:34:22 dev kernel: [1941073.782375] svc: socket ffff88042e723000 recvfrom(ffff88042e72328c, 0) = 4
Mar 31 13:34:22 dev kernel: [1941073.782378] svc: TCP record, 100 bytes
Mar 31 13:34:22 dev kernel: [1941073.782383] svc: socket ffff88042e723000 recvfrom(ffff88041ea23008, 4088) = 8
Mar 31 13:34:22 dev kernel: [1941073.782389] svc: socket ffff88042e723000 recvfrom(ffff88041ea23064, 3996) = 92
Mar 31 13:34:22 dev kernel: [1941073.782392] svc: svc_process dropit
Mar 31 13:34:22 dev kernel: [1941073.782394] svc: xprt ffff88042e723000 dropped request
Mar 31 13:34:22 dev kernel: [1941073.782397] svc: TCP complete record (100 bytes)
Mar 31 13:34:22 dev kernel: [1941073.782399] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782403] svc: server ffff88040fbc6000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.782407] svc: transnee
Mar 31 13:34:22 dev kernel: 4[917.869 v:scsc_reff802720)<>1403723]sc evrff801a00,po ,tasotff802730,iue5<>1403723]s:tprc ff84e200dt  on0coe0<>1403724]sc oktff802730 evrmff802738,0
Mar 31 13:34:22 dev kernel: 4[917.865 v:TPrcr,10bts<>1403724]sc oktff802730 evrmff80ba50,48)=8<>1403725]rvstcnee
Mar 31 13:34:22 dev kernel: 4[917.864 eii acld<>1407.825]rvstcneld<>1407.25]rvstcnee
Mar 31 13:34:22 dev kernel: 4[917.869 eii anee
Mar 31 13:34:22 dev kernel: 4[917.861 eii cnee
Mar 31 13:34:22 dev kernel: [1941073.782663] svc: svc_sock_free(ffff88042e727c00)
Mar 31 13:34:22 dev kernel: [1941073.782675] svc: socket ffff88042e723000 recvfrom(ffff8802b0ae5064, 3996) = 92
Mar 31 13:34:22 dev kernel: [1941073.782678] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782680] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782681] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782683] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782685] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782687] svc: TCP complete record (100 bytes)
Mar 31 13:34:22 dev kernel: [1941073.782690] svc: transport ffff88042e723000 served by daemon ffff88040fbc4000
Mar 31 13:34:22 dev kernel: [1941073.782693] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782694] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782697] svc: svc_sock_free(ffff88042e720800)
Mar 31 13:34:22 dev kernel: [1941073.782702] svc: got len=100
Mar 31 13:34:22 dev kernel: [1941073.782705] svc: svc_authenticate (1)
Mar 31 13:34:22 dev kernel: [1941073.782707] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782709] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782711] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782713] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782715] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782717] svc: calling dispatcher
Mar 31 13:34:22 dev kernel: [1941073.782720] nfsd_dispatch: vers 4 proc 1
Mar 31 13:34:22 dev kernel: [1941073.782722] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782725] nfsv4 compound op #1/1: 30 (OP_RENEW)
Mar 31 13:34:22 dev kernel: [1941073.782727] process_renew(4d9458d1/0000073f): starting
Mar 31 13:34:22 dev kernel: [1941073.782730] nfsd4_renew: clientid not found!
Mar 31 13:34:22 dev kernel: [1941073.782733] nfsv4 compound op ffff88025b2e0078 opcnt 1 #1: 30: status 10011
Mar 31 13:34:22 dev kernel: [1941073.782735] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782737] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782739] revisit canceled
Mar 31 13:34:22 dev kernel: [1941073.782741] nfsv4 compound returned 10011
Mar 31 13:34:22 dev kernel: [1941073.782744] svc: svc_sock_free(ffff88042e724000)
Mar 31 13:34:22 dev kernel: [1941073.782755] svc: socket ffff88042e723000 sendto([ffff8803aa615000 48... ], 48) = 48 (addr 192.168.151.7, port=811)
Mar 31 13:34:22 dev kernel: [1941073.782758] revisit queued
Mar 31 13:34:22 dev kernel: [1941073.782760] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782762] revisit queued
Mar 31 13:34:22 dev kernel: [1941073.782765] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782767] revisit queued
Mar 31 13:34:22 dev kernel: [1941073.782769] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782773] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782777] svc: server ffff880418a00000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.782786] svc: server ffff88040fbc4000, pool 0, transport ffff88042e723000, inuse=2
Mar 31 13:34:22 dev kernel: [1941073.782790] svc: transport ffff88042e723000 served by daemon ffff880418a00000
Mar 31 13:34:22 dev kernel: [1941073.782796] svc: got len=100
Mar 31 13:34:22 dev kernel: [1941073.782817] svc: svc_authenticate (1)
Mar 31 13:34:22 dev kernel: [1941073.782821] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782823] svc: calling dispatcher
Mar 31 13:34:22 dev kernel: [1941073.782825] nfsd_dispatch: vers 4 proc 1
Mar 31 13:34:22 dev kernel: [1941073.782828] nfsv4 compound op #1/1: 30 (OP_RENEW)
Mar 31 13:34:22 dev kernel: [1941073.782831] process_renew(4d9458d1/0000073f): starting
Mar 31 13:34:22 dev kernel: [1941073.782832] nfsd4_renew: clientid not found!
Mar 31 13:34:22 dev kernel: [1941073.782835] nfsv4 compound op ffff8802afe5f078 opcnt 1 #1: 30: status 10011
Mar 31 13:34:22 dev kernel: [1941073.782838] nfsv4 compound returned 10011
Mar 31 13:34:22 dev kernel: [1941073.782857] svc: socket ffff88042e723000 sendto([ffff88041ea23000 48... ], 48) = 48 (addr 192.168.151.7, port=811)
Mar 31 13:34:22 dev kernel: [1941073.782860] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782863] svc: server ffff88040fbc4000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.782871] svc: server ffff880418a00000, pool 0, transport ffff88042e723000, inuse=2
Mar 31 13:34:22 dev kernel: [1941073.782874] svc: transport ffff88042e723000 served by daemon ffff88040fbc4000
Mar 31 13:34:22 dev kernel: [1941073.782878] svc: got len=100
Mar 31 13:34:22 dev kernel: [1941073.782880] svc: svc_authenticate (1)
Mar 31 13:34:22 dev kernel: [1941073.782885] svc: server ffff88040fbc4000, pool 0, transport ffff88042e723000, inuse=3
Mar 31 13:34:22 dev kernel: [1941073.782889] svc: transport ffff88042e723000 served by daemon ffff88040fbc6000
Mar 31 13:34:22 dev kernel: [1941073.782895] svc: got len=100
Mar 31 13:34:22 dev kernel: [1941073.782897] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.782900] svc: calling dispatcher
Mar 31 13:34:22 dev kernel: [1941073.782902] nfsd_dispatch: vers 4 proc 1
Mar 31 13:34:22 dev kernel: <4n eund101<>1403737]sc o e=0
Mar 31 13:34:22 dev kernel: 4[917.803 v:scatetct 1
Mar 31 13:34:22 dev kernel: 4[917.808 v:tasotff802730 uy o nuud<>1403738]sc aln ipthr<>1403738]ns_ipth es4po
Mar 31 13:34:22 dev kernel: 4[917.807 v:sce ff84e200snt(ff80ffd0 8. ,4)=4 ad 9.6.5.,pr=1)<>1403739]ns4cmon (o=900)<>1403730]ns4rnw letdntfud
Mar 31 13:34:22 dev kernel: 4[917.818 fv opudo ff82f508oct1#:3:sau 01
Mar 31 13:34:22 dev kernel: 4[917.812 v:sre ff848000 ol0 rnpr ff84e200 ns=
Mar 31 13:34:22 dev kernel: 4[917.816 v:tprc ff84e200dt  on0coe0<>1403732]ns4cmon eund101<>1403732]sc oktff802730 evrmff802738,0
Mar 31 13:34:22 dev kernel: 4[917.816 v:TPrcr,10bts<>1403733]sc oktff802730 evrmff807280,48)=8<>1403734]sc oktff802730 edo[ff8489004..] 8  8(dr1218117 ot81
Mar 31 13:34:22 dev kernel: 4[917.815 v:tasotff802730 uy o nuud<>1403735]sc oktff802730 evrmff807286,39)=9
Mar 31 13:34:22 dev kernel: 4[917.813 v:TPcmlt eod(0 ye)<>1403735]sc rnpr ff84e200sre ydeo ff84fc00<>1403735]sc evrff800b40 atn o aa(o=900)<>1403736]sc o e=0
Mar 31 13:34:22 dev kernel: 4[917.818 v:scatetct 1
Mar 31 13:34:22 dev kernel: 4[917.812 v:tasotff802730 uy o nuud<>1403737]sc aln dispatcher
Mar 31 13:34:22 dev kernel: [1941073.783177] nfsd_dispatch: vers 4 proc 1
Mar 31 13:34:22 dev kernel: [1941073.783181] nfsv4 compound op #1/1: 30 (OP_RENEW)
Mar 31 13:34:22 dev kernel: [1941073.783183] process_renew(4d9458d1/0000073f): starting
Mar 31 13:34:22 dev kernel: [1941073.783195] nfsd4_renew: clientid not found!
Mar 31 13:34:22 dev kernel: [1941073.783198] nfsv4 compound op ffff88025b2e0078 opcnt 1 #1: 30: status 10011
Mar 31 13:34:22 dev kernel: [1941073.783201] nfsv4 compound returned 10011
Mar 31 13:34:22 dev kernel: [1941073.783211] svc: socket ffff88042e723000 sendto([ffff8804132e7000 48... ], 48) = 48 (addr 192.168.151.7, port=811)
Mar 31 13:34:22 dev kernel: [1941073.783215] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.783217] svc: server ffff880418a00000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.783225] svc: server ffff88040fbc6000, pool 0, transport ffff88042e723000, inuse=2
Mar 31 13:34:22 dev kernel: [1941073.783228] svc: tcp_recv ffff88042e723000 data 1 conn 0 close 0
Mar 31 13:34:22 dev kernel: [1941073.783232] svc: socket ffff88042e723000 recvfrom(ffff88042e72328c, 0) = 4
Mar 31 13:34:22 dev kernel: [1941073.783234] svc: TCP record, 100 bytes
Mar 31 13:34:22 dev kernel: [1941073.783237] svc: socket ffff88042e723000 recvfrom(ffff880351f9c008, 4088) = 8
Mar 31 13:34:22 dev kernel: [1941073.783241] svc: socket ffff88042e723000 recvfrom(ffff880351f9c064, 3996) = 92
Mar 31 13:34:22 dev kernel: [1941073.783244] svc: TCP complete record (100 bytes)
Mar 31 13:34:22 dev kernel: [1941073.783246] svc: transport ffff88042e723000 served by daemon ffff880418a00000
Mar 31 13:34:22 dev kernel: [1941073.783251] svc: got len=100
Mar 31 13:34:22 dev kernel: [1941073.783253] svc: svc_authenticate (1)
Mar 31 13:34:22 dev kernel: [1941073.783258] svc: server ffff880418a00000, pool 0, transport ffff88042e723000, inuse=3
Mar 31 13:34:22 dev kernel: [1941073.783261] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.783265] svc: tcp_recv ffff88042e723000 data 1 conn 0 close 0
Mar 31 13:34:22 dev kernel: [1941073.783270] svc: socket ffff88042e723000 recvfrom(ffff88042e723288, 4) = -11
Mar 31 13:34:22 dev kernel: [1941073.783273] RPC: TCP recv_record got EAGAIN
Mar 31 13:34:22 dev kernel: [1941073.783275] svc: calling dispatcher
Mar 31 13:34:22 dev kernel: [1941073.783278] nfsd_dispatch: vers 4 proc 1
Mar 31 13:34:22 dev kernel: [1941073.783281] svc: got len=-11
Mar 31 13:34:22 dev kernel: [1941073.783283] nfsv4 compound op #1/1: 30 (OP_RENEW)
Mar 31 13:34:22 dev kernel: [1941073.783286] process_renew(4d9458d1/0000073f): starting
Mar 31 13:34:22 dev kernel: [1941073.783288] nfsd4_renew: clientid not found!
Mar 31 13:34:22 dev kernel: [1941073.783292] nfsv4 compound op ffff880377c22078 opcnt 1 #1: 30: status 10011
Mar 31 13:34:22 dev kernel: [1941073.783295] svc: server ffff880418a00000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.783299] nfsv4 compound returned 10011
Mar 31 13:34:22 dev kernel: [1941073.783314] svc: socket ffff88042e723000 sendto([ffff88041ea23000 48... ], 48) = 48 (addr 192.168.151.7, port=811)
Mar 31 13:34:22 dev kernel: [194ffff880351f9c008, 4088) = 8
Mar 31 13:34:22 dev kernel: [1941073.792890] svc: socket ffff88042e723000 recvfrom(ffff880351f9c0bc, 3908) = 180
Mar 31 13:34:22 dev kernel: [1941073.792893] svc: TCP complete record (188 bytes)
Mar 31 13:34:22 dev kernel: [1941073.792896] svc: transport ffff88042e723000 served by daemon ffff880418a00000
Mar 31 13:34:22 dev kernel: [1941073.792900] svc: got len=188
Mar 31 13:34:22 dev kernel: [1941073.792903] svc: svc_authenticate (1)
Mar 31 13:34:22 dev kernel: [1941073.792909] svc: server ffff880418a00000, pool 0, transport ffff88042e723000, inuse=3
Mar 31 13:34:22 dev kernel: [1941073.792912] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.792914] svc: calling dispatcher
Mar 31 13:34:22 dev kernel: [1941073.792917] nfsd_dispatch: vers 4 proc 1
Mar 31 13:34:22 dev kernel: [1941073.792920] svc: tcp_recv ffff88042e723000 data 1 conn 0 close 0
Mar 31 13:34:22 dev kernel: [1941073.792923] nfsv4 compound op #1/1: 35 (OP_SETCLIENTID)
Mar 31 13:34:22 dev kernel: [1941073.792927] NFSD: nfs4_make_rec_clidname for 192.168.151.6/192.168.151.9 tcp UNIX 0
Mar 31 13:34:22 dev kernel: [1941073.792931] svc: socket ffff88042e723000 recvfrom(ffff88042e723288, 4) = -11
Mar 31 13:34:22 dev kernel: [1941073.792934] RPC: TCP recv_record got EAGAIN
Mar 31 13:34:22 dev kernel: [1941073.792936] svc: got len=-11
Mar 31 13:34:22 dev kernel: [1941073.792938] svc: server ffff880418a00000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.792952] nfsv4 compound op ffff880377c22078 opcnt 1 #1: 35: status 0
Mar 31 13:34:22 dev kernel: [1941073.792956] nfsv4 compound returned 0
Mar 31 13:34:22 dev kernel: [1941073.792971] svc: socket ffff88042e723000 sendto([ffff8802b0ae5000 64... ], 64) = 64 (addr 192.168.151.7, port=811)
Mar 31 13:34:22 dev kernel: [1941073.792979] svc: server ffff88040fbc6000 waiting for data (to = 900000)
Mar 31 13:34:22 dev kernel: [1941073.793055] svc: socket ffff88000dc41a00 TCP data ready (svsk ffff88042e723000)
Mar 31 13:34:22 dev kernel: [1941073.793058] svc: transport ffff88042e723000 served by daemon ffff88040fbc6000
Mar 31 13:34:22 dev kernel: [1941073.793067] svc: server ffff88040fbc6000, pool 0, transport ffff88042e723000, inuse=2
Mar 31 13:34:22 dev kernel: [1941073.793070] svc: tcp_recv ffff88042e723000 data 1 conn 0 close 0
Mar 31 13:34:22 dev kernel: [1941073.793075] svc: socket ffff88042e723000 recvfrom(ffff88042e72328c, 0) = 4
Mar 31 13:34:22 dev kernel: [1941073.793077] svc: TCP record, 128 bytes
Mar 31 13:34:22 dev kernel: [1941073.793080] svc: socket ffff88042e723000 recvfrom(ffff880351f9c008, 4088) = 8
Mar 31 13:34:22 dev kernel: [1941073.793084] svc: socket ffff88042e723000 recvfrom(ffff880351f9c080, 3968) = 120
Mar 31 13:34:22 dev kernel: [1941073.793087] svc: TCP complete record (128 bytes)
Mar 31 13:34:22 dev kernel: [1941073.793089] svc: transport ffff88042e723000 served by daemon ffff880418a00000
Mar 31 13:34:22 dev kernel: [1941073.793093] svc: got len=128
Mar 31 13:34:22 dev kernel: [1941073.793096] svc: svc_authenticate (1)
Mar 31 13:34:22 dev kernel: [1941073.793100] svc: server ffff880418a00000, pool 0, transport ffff88042e723000, inuse=3
Mar 31 13:34:22 dev kernel: [1941073.793103] svc: transport ffff88042e723000 busy, not enqueued
Mar 31 13:34:22 dev kernel: [1941073.793106] svc: calling dispatcher
Mar 31 13:34:22 dev kernel: [1941073.793108] nfsd_dispatch: vers 4 proc 1
Mar 31 13:34:22 dev kernel: [1941073.793112] svc: tcp_recv ffff88042e723000 data 1 conn 0 close 0
Mar 31 13:34:22 dev kernel: [1941073.793115] nfsv4 compound op #1/3: 36 (OP_SETCLIENTID_CONFIRM)
Mar 31 13:34:22 dev kernel: [1941073.793119] NFSD: move_to_confirm nfs4_client ffff88042e725800
Mar 31 13:34:22 dev kernel: [1941073.793123] svc: socket ffff88042e723000 recvfrom(ffff88042e723288, 4) = -11
Mar 31 13:34:22 dev kernel: [1941073.793125] RPC: TCP recv_record got EAGAIN
Mar 31 13:34:22 dev kernel: [1941073.793128] svc: got len=-11
Mar 31 13:34:22 dev kernel: [1941073.793130] svc: server ffff880418a00000 waiting for data (to = 900000)
Mar 31 13:80<>1403730]RC 21 albn sau )<>1403730]RC 21 alcnetxr ff825e00i o once
Mar 31 13:34:22 dev kernel: 4[917.927 P:312xr_onc ptff80f8a0 sntonce
Mar 31 13:34:22 dev kernel: 4[917.920 P:312sepo(uu xr_edn"tm 71037
Mar 31 13:34:22 dev kernel: 4[917.923 P:312addt uu ff825e20"ptpnig
Mar 31 13:34:22 dev kernel: 4[917.926 P:312stigaamfr600m
Mar 31 13:34:22 dev kernel: 4[917.929 P:   x_onc ceue ptff80f8a0
Mar 31 13:34:22 dev kernel: 4[917.923 fv opudo ff837208oct3#:3:sau
Mar 31 13:34:22 dev kernel: 4[917.926 fv opudo 23 4(PPTOTH
Mar 31 13:34:22 dev kernel: 4[917.920 P:   Wn pae eae10 g=
Mar 31 13:34:22 dev kernel: 4[917.920 fv opudo ff837208oct3#:2:sau 00
Mar 31 13:34:22 dev kernel: 4[917.923 fv opudrtre 00
Mar 31 13:34:22 dev kernel: 4[917.925 fd rpigrqet a ervstdltr<>1403734]sc v_rcs rpt<>1403735]sc ptff802730 rpe eus
Mar 31 13:34:22 dev kernel: 4[917.924 v:sre ff84fc00wiigfrdt t  000
Mar 31 13:34:22 dev kernel: 4[917.937 F:rvldtn 01/)<>1403734]RC    okn pUI rd<>1403734]RC    okn pUI rd<>1403734]RC    e akiiilzd rci 21
Mar 31 13:34:22 dev kernel: 4[917.938 P:   alctdts ff84d3b0<>1403735]RC 21 _p_xct lg=x8
Mar 31 13:34:22 dev kernel: 4[917.933 P:313cl_tr f4po EAT sn)<>1403735]RC 21 alrsre(tts0
Mar 31 13:34:22 dev kernel: 4[917.938 P:   wiigfrrqetso
Mar 31 13:34:22 dev kernel: 4[917.930 P:313sepo(uu xr_alg ie48014)<>140.933 P:313addt uu f80279a xr_akg
Mar 31 13:34:22 dev kernel: 4[403736]P:313sn akgi ose
Mar 31 13:34:23 dev kernel: [1941074.360512] RPC:       killing all tasks for client ffff8803efe08a00
Mar 31 13:34:24 dev kernel: [1941075.358733] RPC:       killing all tasks for client ffff8803efe08a00
Mar 31 13:34:25 dev kernel: [1941076.357001] RPC:       killing all tasks for client ffff8803efe08a00
Mar 31 13:34:26 dev kernel: [1941077.355322] RPC:       killing all tasks for client ffff8803efe08a00

So, I'm unable to get the nfs working again, unless I reboot the server.
I'm using portmap 6.0.0-2. Shall I use rcpbind instead ?

I'm not able to reproduce the bug, but I know it will come again, it's the third time that I get it.

Thanks for reading.
Best regards,
Yann

-- System Information:
Debian Release: 6.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/16 CPU cores)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages nfs-kernel-server depends on:
ii  libblkid1               2.17.2-9         block device id library
ii  libc6                   2.11.2-10        Embedded GNU C Library: Shared lib
ii  libcomerr2              1.41.12-2        common error description library
ii  libgssapi-krb5-2        1.8.3+dfsg-4     MIT Kerberos runtime libraries - k
ii  libgssglue1             0.1-4            mechanism-switch gssapi library
ii  libk5crypto3            1.8.3+dfsg-4     MIT Kerberos runtime libraries - C
ii  libkrb5-3               1.8.3+dfsg-4     MIT Kerberos runtime libraries
ii  libnfsidmap2            0.23-2           An nfs idmapping library
ii  librpcsecgss3           0.19-2           allows secure rpc communication us
ii  libwrap0                7.6.q-19         Wietse Venema's TCP wrappers libra
ii  lsb-base                3.2-23.2squeeze1 Linux Standard Base 3.2 init scrip
ii  nfs-common              1:1.2.2-4        NFS support files common to client
ii  ucf                     3.0025+nmu1      Update Configuration File: preserv

nfs-kernel-server recommends no packages.

nfs-kernel-server suggests no packages.

-- no debconf information



Reply to: