[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#342354: kernel-image-2.6.8-2-686: unmounting nfs volumes hangs the halt process



Package: kernel-image-2.6.8-2-686
Severity: normal

Hi,

halting my cluster (~160 machines) usually results in about 5-6 hung
ones.  The hang happens at the bottom of rc0.d/S31umountnfs, issuing the
umount command.  I modified the script to produce some more output:

/etc/rc0.d/S31umountnfs:
[...]
        exec </dev/null
        if [ -n "$DIRS" ]
        then
                fuser -mv $DIRS                # added by me
                echo umount $FLAGS $DIRS       # added by me
                umount $FLAGS $DIRS
        fi
) </etc/mtab
[...]

On one occasion, I also got a "kernel BUG at fs/nfs/inode.c:151!", find
some screenshots at http://tac.ki.iif.hu/kernelbug.  It's not a hard
lockup, magic sysrq can reboot the machines, they even emit IP traffic,
as the above URL shows (it's a silently hung machine, not the one the
screenshots were taken from).

The machines are pure Sarge, NFS rooted, also mounting /home(rw) and
/usr/local(ro) over NFS.

If you need any further detail, don't hesitate to ask.  The problem
seems to only happen when lots of clients are trying to halt
simultaneously on the same LAN.  Also, their name in the RPC calls are
(none), as it gets set after the root is mounted.  I'm changing this and
will report if that makes any difference.

Thanks,
Feri.

-- System Information:
Debian Release: 3.1
  APT prefers unstable
  APT policy: (50, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.6.12-1-k7
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)



Reply to: