Bug#342354: kernel-image-2.6.8-2-686: unmounting nfs volumes hangs the halt process
Package: kernel-image-2.6.8-2-686
Severity: normal
Hi,
halting my cluster (~160 machines) usually results in about 5-6 hung
ones. The hang happens at the bottom of rc0.d/S31umountnfs, issuing the
umount command. I modified the script to produce some more output:
/etc/rc0.d/S31umountnfs:
[...]
exec </dev/null
if [ -n "$DIRS" ]
then
fuser -mv $DIRS # added by me
echo umount $FLAGS $DIRS # added by me
umount $FLAGS $DIRS
fi
) </etc/mtab
[...]
On one occasion, I also got a "kernel BUG at fs/nfs/inode.c:151!", find
some screenshots at http://tac.ki.iif.hu/kernelbug. It's not a hard
lockup, magic sysrq can reboot the machines, they even emit IP traffic,
as the above URL shows (it's a silently hung machine, not the one the
screenshots were taken from).
The machines are pure Sarge, NFS rooted, also mounting /home(rw) and
/usr/local(ro) over NFS.
If you need any further detail, don't hesitate to ask. The problem
seems to only happen when lots of clients are trying to halt
simultaneously on the same LAN. Also, their name in the RPC calls are
(none), as it gets set after the root is mounted. I'm changing this and
will report if that makes any difference.
Thanks,
Feri.
-- System Information:
Debian Release: 3.1
APT prefers unstable
APT policy: (50, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.6.12-1-k7
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Reply to: