[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

nfs problems: can't find request slot



I have a shared /home directory under two machines: a box called
"strago", running OpenBSD 2.7, and a box called "shadow", running woody.

/home is an entire harddisk on strago, mounted on shadow through NFS.

The line I use in fstab to mount /home is:
strago:/home /home nfs rsize=8192,wsize=8192,timeo=14,intr

Lately, the nfs connection to strago has been dying, for unknown
reasons, causing shadow to crash hard. (No ctrl-alt-del, Magic SysRq
Key, or any of that will work, nor can I telnet/ssh in from another host
and reboot from there.) This has been happening more and more
frequently, to the point that it has now occurred 5 times today.

Sometimes, if I can tell that the NFS connection has died, I can
quickly umount /home as root and remount it again, at which point
everything ends up working fine, with no crashes or anything.

The error messages I get (as logged in /var/log/messages) are as
follows:

Nov 11 19:12:46 shadow kernel: nfs: server strago not responding, still
trying
Nov 11 19:13:01 shadow kernel: nfs: task 4940 can't get a request slot
Nov 11 19:13:02 shadow kernel: nfs: task 4954 can't get a request slot
Nov 11 19:13:09 shadow kernel: nfs: task 4955 can't get a request slot
Nov 11 19:14:47 shadow kernel: nfs: task 4956 can't get a request slot

It's not a (physical) connection problem between the two machines
(at least as far as I can tell) ... status LEDs on both NICs still blink, 
and the connection will work fine after the Debian box has been rebooted or
if I can quickly umount & remount /home. 

On the OpenBSD end, I was running nfsd with the options -tun 4 (which
means "serve tcp and udp clients, with 4 servers." By advisory of the
OpedBSD mailing list, I pumped the # of servers up to 16, but the
problems persist. The machine isn't being used to export NFS to anywhere
else, so 16 servers should be more than enough for my needs (right?)

Anyone have any ideas on what's going wrong, and what I can try to fix
it?

Thanks a lot, folks.

- Colin McMillen



Reply to: