[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#617666: marked as done (nfs-kernel-server: Periodic nfsd failure - single nfsd process with high CPU and no mounts working)



Your message dated Mon, 21 Mar 2011 19:02:21 +0100
with message-id <4D8792AD.7080804@debian.org>
and subject line Re: nfs-kernel-server: Periodic nfsd failure - single nfsd process with high CPU and no mounts working
has caused the Debian Bug report #617666,
regarding nfs-kernel-server: Periodic nfsd failure - single nfsd process with high CPU and no mounts working
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
617666: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=617666
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: nfs-kernel-server
Version: 1:1.2.2-4
Severity: grave
Justification: renders package unusable


Hi there,

appologies if this has already been reported but I couldn't see anything quite matching what I'm seeing.

I have a 26TB debian squeeze fileserver providing NFS mounts to a large number of users.  The system has been working flawlessly for a number of months but twice in the last week NFS seems to have crashed.  The first thing I noticed is that users reported being unable to access shares.  Logging into the system I see a single nfsd process taking 100% CPU with a very long run time.  Restarting nfs-kernel-server has no effect.  The process is unkillable (even with -9) and the system has required a reboot to get it usable again.  jnettop is not showing significant network traffic and lsof on /export/ (where all my NFS exports are located) shows no nfs access to any files.  

Please let me know if you need any further information.  I am going to reboot the server now, so I may not be able to reproduce the problem straight away (but as its happened twice, I am quite sure it will happen again at some point...).

Thanks in advance for your help.

Dan Tomlinson

My /etc/exports file is below:


# /etc/exports: the access control list for filesystems which may be exported
#		to NFS clients.  See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes       hostname1(no_subtree_check,rw,sync,no_subtree_check) hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
#

# misc shares
/export/software		192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)
/export/system_tools		192.168.32.0/24(no_subtree_check,rw,sync,no_root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecure)
/export/home		192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecure)

# flychip shares
/export/flychip/archives	192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)
/export/flychip/misc		192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)
/export/flychip/production	192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)
/export/flychip/share		192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)
/export/flychip/temp		192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)

# mickelm shares
/export/micklem/releases		192.168.32.0/24(no_subtree_check,rw,sync,no_root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecure)
/export/micklem/data		192.168.32.0/24(no_subtree_check,rw,sync,no_root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecure)

# logic shares
/export/logic/data		192.168.32.0/24(no_subtree_check,rw,sync,no_root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecure)
/export/logic/webdav		192.168.32.0/24(no_subtree_check,rw,sync,no_root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecure)



-- System Information:
Debian Release: 6.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/16 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages nfs-kernel-server depends on:
ii  libblkid1               2.17.2-9         block device id library
ii  libc6                   2.11.2-10        Embedded GNU C Library: Shared lib
ii  libcomerr2              1.41.12-2        common error description library
ii  libgssapi-krb5-2        1.8.3+dfsg-4     MIT Kerberos runtime libraries - k
ii  libgssglue1             0.1-4            mechanism-switch gssapi library
ii  libk5crypto3            1.8.3+dfsg-4     MIT Kerberos runtime libraries - C
ii  libkrb5-3               1.8.3+dfsg-4     MIT Kerberos runtime libraries
ii  libnfsidmap2            0.23-2           An nfs idmapping library
ii  librpcsecgss3           0.19-2           allows secure rpc communication us
ii  libwrap0                7.6.q-19         Wietse Venema's TCP wrappers libra
ii  lsb-base                3.2-23.2squeeze1 Linux Standard Base 3.2 init scrip
ii  nfs-common              1:1.2.2-4        NFS support files common to client
ii  ucf                     3.0025+nmu1      Update Configuration File: preserv

nfs-kernel-server recommends no packages.

nfs-kernel-server suggests no packages.

-- no debconf information



--- End Message ---
--- Begin Message ---
On 03/21/2011 12:33 PM, Dan Tomlinson wrote:
> On 20/03/11 17:20, Luk Claes wrote:
>>> On 10/03/11 12:54, Debian Bug Tracking System wrote:

> Hi Luk,
> 
> thanks for getting back to me.  My xfs_repair did finish and it found a
> few errors, but I'm not sure if they are from hard resetting the machine
> or some indication of a more serious hardware error.  I am however
> pretty sure that this is not a purely NFS problem - since the repair
> finished, the system has crashed in a couple of different ways.  Once it
> dumped the kernel to the console and went completely unresponsive and
> another time the /export partition unmounted itself and wouldn't remount
> (giving IO errors).  In both cases there was no weird NFS process
> hanging around (the mounts just became inaccessible as you would expect
> them to after such crashes).
> 
> At this point I am pretty sure that I have a hardware issue on my hands,
> either with bad RAM or my raid controller.  I think we can safely say
> NFS is in the clear :)  Sorry for wasting your time!

Hi Dan

No problem, I'll close this bug. Thanks for the quick reply and good
luck with your hardware.

Cheers

Luk


--- End Message ---

Reply to: