[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#699361: linux-image-3.2.0-0.bpo.4-amd64: nfsd4 RELEASE_LOCKOWNER is slow and CPU intensive



Package: src:linux
Version: 3.2.35-2~bpo60+1
Severity: normal
Tags: upstream patch

*** Please type your report below this line ***

We are running a NFSv4 server on Debian Squeeze with 3.2.0-0.bpo.4-amd64
and recently started observing performance degradation in the form of
slower response time for all NFS call. Investigation found elevated
CPU usage at kernel level (%sys) which was tracked down to mutex_spin_on_owner
and nfsd4_release_lockowner.

Sample output of "perf top":
        Events: 70K cycles
         61.46%  [kernel]               [k] mutex_spin_on_owner
         30.68%  [nfsd]                 [k] nfsd4_release_lockowner
          1.50%  [kernel]               [k] intel_idle
          0.11%  [kernel]               [k] irq_entries_start
          0.11%  [sunrpc]               [k] svc_recv
        [...]

As the problem was first happening intermittently but for hours at a time,
network captures were also taken and compared. It was found that the problem
(slower NFS response time and elevated kernel CPU usage) was correlated with
an elevated rate of RELEASE_LOCKOWNER requests.

The source for nfsd4_release_lockowner looks like this:

4252 __be32
4253 nfsd4_release_lockowner(struct svc_rqst *rqstp,
4254                         struct nfsd4_compound_state *cstate,
4255                         struct nfsd4_release_lockowner *rlockowner)
4256 {
....
4275         nfs4_lock_state();
4276
4277         status = nfserr_locks_held;
4278         /* XXX: we're doing a linear search through all the lockowners.
4279          * Yipes!  For now we'll just hope clients aren't really using
4280          * release_lockowner much, but eventually we have to fix these
4281          * data structures. */
4282         INIT_LIST_HEAD(&matches);
4283         for (i = 0; i < LOCK_HASH_SIZE; i++) {
4284                 list_for_each_entry(sop,
&lock_ownerstr_hashtbl[i], so_strhash) {
4285                         if (!same_owner_str(sop, owner, clid))
4286                                 continue;
4287                         list_for_each_entry(stp, &sop->so_stateids,
4288                                         st_perstateowner) {
4289                                 lo = lockowner(sop);
4290                                 if (check_for_locks(stp->st_file, lo))
4291                                         goto out;
4292                                 list_add(&lo->lo_list, &matches);
4293                         }
4294                 }
4295         }
4296         /* Clients probably won't expect us to return with some
(but not all)
4297          * of the lockowner state released; so don't release any until all
4298          * have been checked. */
4299         status = nfs_ok;
4300         while (!list_empty(&matches)) {
4301                 lo = list_entry(matches.next, struct nfs4_lockowner,
4302                                                                 lo_list);
4303                 /* unhash_stateowner deletes so_perclient only
4304                  * for openowners. */
4305                 list_del(&lo->lo_list);
4306                 release_lockowner(lo);
4307         }
4308 out:
4309         nfs4_unlock_state();
4310         return status;
4311 }

So the problem is even documented at that level. Looking through upstream
git it appears a fix was applied in
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=06f1f864d4ae5804e83785308d41f14a08e4b980

We have not verified this patch applies cleanly or actually resolves
the problem but it seems likely.

Can you please consider applying that patch?

I have looked through existing Debian bug reports and I could not find
anything relevant although this one has similar superficial symptoms:
http://bugs.debian.org/692957

For the record, Dropbox 1.4.0 seems to be triggering an excessive amount
of RELEASE_LOCKOWNER requests, thus causing this problem. A separate bug report
has been filled with Dropbox:
https://forums.dropbox.com/topic.php?id=96061&replies=1

-- Package-specific info:
** Version:
Linux version 3.2.0-0.bpo.4-amd64 (debian-kernel@lists.debian.org)
(gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Debian 3.2.35-2~bpo60+1

** Command line:
BOOT_IMAGE=/boot/vmlinuz-3.2.0-0.bpo.4-amd64
root=UUID=ca958596-33fb-4e2c-9d87-16a74a584ea2 ro console=tty0 quiet

** Not tainted

** Loaded modules:
uinput
nfsd
nfs
lockd
fscache
auth_rpcgss
nfs_acl
sunrpc
ipmi_devintf
ipmi_msghandler
loop
snd_hda_intel
snd_hda_codec
snd_hwdep
snd_pcm
i7core_edac
edac_core
ioatdma
i2c_i801
snd_timer
snd
psmouse
acpi_cpufreq
mperf
i2c_core
tpm_tis
serio_raw
coretemp
processor
button
evdev
pcspkr
dca
crc32c_intel
tpm
tpm_bios
soundcore
snd_page_alloc
thermal_sys
ext4
mbcache
jbd2
crc16
dm_mod
raid10
raid1
md_mod
sd_mod
crc_t10dif
usbhid
hid
uhci_hcd
ehci_hcd
usbcore
ahci
libahci
libata
e1000e
scsi_mod
usb_common

-- System Information:
Debian Release: 6.0.2
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-0.bpo.4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages linux-image-3.2.0-0.bpo.4-amd64 depends on:
ii  debconf [debconf-2.0]       1.5.36.1     Debian configuration management sy
ii  initramfs-tools [linux-init 0.99~bpo60+1 tools for generating an initramfs
ii  linux-base                  3.4~bpo60+1  Linux image base package
ii  module-init-tools           3.12-1       tools for managing Linux kernel mo

Versions of packages linux-image-3.2.0-0.bpo.4-amd64 recommends:
pn  firmware-linux-free           <none>     (no description available)

Versions of packages linux-image-3.2.0-0.bpo.4-amd64 suggests:
pn  debian-kernel-handbook  <none>           (no description available)
ii  grub-pc                 1.98+20100804-14 GRand Unified Bootloader, version
pn  linux-doc-3.2           <none>           (no description available)

Versions of packages linux-image-3.2.0-0.bpo.4-amd64 is related to:
pn  firmware-atheros              <none>     (no description available)
pn  firmware-bnx2                 <none>     (no description available)
pn  firmware-bnx2x                <none>     (no description available)
pn  firmware-brcm80211            <none>     (no description available)
pn  firmware-intelwimax           <none>     (no description available)
pn  firmware-ipw2x00              <none>     (no description available)
pn  firmware-ivtv                 <none>     (no description available)
pn  firmware-iwlwifi              <none>     (no description available)
pn  firmware-libertas             <none>     (no description available)
pn  firmware-linux                <none>     (no description available)
pn  firmware-linux-nonfree        <none>     (no description available)
pn  firmware-myricom              <none>     (no description available)
pn  firmware-netxen               <none>     (no description available)
pn  firmware-qlogic               <none>     (no description available)
pn  firmware-ralink               <none>     (no description available)
pn  firmware-realtek              <none>     (no description available)
pn  xen-hypervisor                <none>     (no description available)

-- debconf information:
  linux-image-3.2.0-0.bpo.4-amd64/postinst/missing-firmware-3.2.0-0.bpo.4-amd64:
  linux-image-3.2.0-0.bpo.4-amd64/prerm/removing-running-kernel-3.2.0-0.bpo.4-amd64:
true
  linux-image-3.2.0-0.bpo.4-amd64/postinst/depmod-error-initrd-3.2.0-0.bpo.4-amd64:
false
  linux-image-3.2.0-0.bpo.4-amd64/postinst/ignoring-ramdisk:


Reply to: