Bug#754354: Something holds dentry-related mutex forever in Wheezy amd64 kernel

To: Ben Hutchings <ben@decadent.org.uk>
Cc: 754354@bugs.debian.org
Subject: Bug#754354: Something holds dentry-related mutex forever in Wheezy amd64 kernel
From: Mike Crowe <mac@mcrowe.com>
Date: Sun, 28 Sep 2014 17:05:54 +0100
Message-id: <[🔎] 20140928160554.GA24835@mcrowe.com>
Reply-to: Mike Crowe <mac@mcrowe.com>, 754354@bugs.debian.org
In-reply-to: <[🔎] 1411912881.9388.10.camel@decadent.org.uk>
References: <20140721140131.GA17435@mcrowe.com> <20140730100638.GA16669@mcrowe.com> <[🔎] 20140927184134.GA5679@mcrowe.com> <[🔎] 1411912881.9388.10.camel@decadent.org.uk>

On Sunday 28 September 2014 at 15:01:21 +0100, Ben Hutchings wrote:
> On Sat, 2014-09-27 at 19:41 +0100, Mike Crowe wrote:
> > I compiled my own version of the Debian 3.2.60-1+deb7u3 kernel with
> > CONFIG_LOCKDEP and panic on hung task enabled.
> > 
> > >From the crash dump:
> > 
> > [25202.156175] INFO: task nfsd:3247 blocked for more than 900 seconds.
> > [25202.162565] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [25202.170432] nfsd            D ffff88080aa0eca8     0  3247      2 0x00000000
> > [25202.170444]  ffff88080a8e19f0 0000000000000046 0000000000000006 ffff880800000000
> > [25202.170458]  ffff88080aa0e9c0 ffff88080a8e1fd8 ffff88080a8e1fd8 00000000001d4040
> > [25202.170472]  ffff88040e9926c0 ffff88080aa0e9c0 ffffffff8138d6da 00000001a04c47dd
> > [25202.170488] Call Trace:
> > [25202.170504]  [<ffffffff8138d6da>] ? __mutex_lock_common+0x236/0x379
> > [25202.170531]  [<ffffffffa04c47dd>] ? fh_lock_nested+0x4d/0x61 [nfsd]
> > [25202.170542]  [<ffffffff8138cda2>] schedule+0x55/0x57
> > [25202.170552]  [<ffffffff8138d6e7>] __mutex_lock_common+0x243/0x379
> > [25202.170569]  [<ffffffffa04c47dd>] ? fh_lock_nested+0x4d/0x61 [nfsd]
> > [25202.170581]  [<ffffffff8138d8dc>] mutex_lock_nested+0x2a/0x31
> > [25202.170598]  [<ffffffffa04c47dd>] fh_lock_nested+0x4d/0x61 [nfsd]
> > [25202.170610]  [<ffffffff810140f5>] ? sched_clock+0x9/0xd
> > [25202.170626]  [<ffffffffa04c50fe>] nfsd_lookup_dentry+0x196/0x227 [nfsd]
> > [25202.170646]  [<ffffffffa04cef7f>] nfsd4_secinfo.part.15+0x26/0x9e [nfsd]
> > [25202.170666]  [<ffffffffa04cf044>] nfsd4_secinfo+0x4d/0x5b [nfsd]
> > [25202.170688]  [<ffffffffa04ce105>] nfsd4_proc_compound+0x265/0x43e [nfsd]
> > [25202.170703]  [<ffffffffa04c181d>] nfsd_dispatch+0xe2/0x1c8 [nfsd]
> > [25202.170734]  [<ffffffffa03759c1>] svc_process_common+0x2cf/0x4d0 [sunrpc]
> > [25202.170759]  [<ffffffffa0375de0>] svc_process+0x118/0x136 [sunrpc]
> > [25202.170773]  [<ffffffffa04c10eb>] nfsd+0xeb/0x131 [nfsd]
> > [25202.170796]  [<ffffffffa04c1000>] ? 0xffffffffa04c0fff
> > [25202.170806]  [<ffffffff81065c75>] kthread+0xa3/0xab
> > [25202.170815]  [<ffffffff81396584>] kernel_thread_helper+0x4/0x10
> > [25202.170823]  [<ffffffff8138f074>] ? retint_restore_args+0x13/0x13
> > [25202.170830]  [<ffffffff81065bd2>] ? __init_kthread_worker+0x53/0x53
> > [25202.170837]  [<ffffffff81396580>] ? gs_change+0x13/0x13
> > [25202.170842] 1 lock held by nfsd/3247:
> > [25202.170845]  #0:  (&sb->s_type->i_mutex_key#13){+.+.+.}, at: [<ffffffffa04c47dd>] fh_lock_nested+0x4d/0x61 [nfsd]
> > [25202.170870] Kernel panic - not syncing: hung_task: blocked tasks

[snip]

> nfsd is trying to lock two objects in the same class: specifically, it
> locks a file handle and then the file handle for the file's parent.
> It's generally safe to do this so long as they're always taken in that
> order.  lockdep should complain (much more verbosely) if this is not
> done consistently.

That makes sense. So is there any clue as to why it's blocking inside the
second mutex_lock_nested?

> I'm afraid this doesn't explain what's going wrong.  But if there are
> any more messages from lockdep further up the log (like, 15 minutes
> earlier), they might do.

Unfortunately not, the previous line in the log is the last message from boot time:

[   38.624072] vnet0: no IPv6 routers present

Is there a way I can persuade crash(8) to tell me which process currently
has the lock in question?

Do you have any advice as to any more debug stuff I should try turning on
when compiling the kernel?

Thanks for your help.

Mike.

Attachment: signature.asc
Description: Digital signature

Reply to:

References:
- Bug#754354: Something holds dentry-related mutex forever in Wheezy amd64 kernel
  - From: Mike Crowe <mac@mcrowe.com>
- Bug#754354: Something holds dentry-related mutex forever in Wheezy amd64 kernel
  - From: Ben Hutchings <ben@decadent.org.uk>

Prev by Date: Bug#763192: [LXC] [nfsd] kernel crash when running nfs-kernel-server in one LXC Container
Next by Date: Processed: Re: Bug#763192: [LXC] [nfsd] kernel crash when running nfs-kernel-server in one LXC Container
Previous by thread: Bug#754354: Something holds dentry-related mutex forever in Wheezy amd64 kernel
Next by thread: Bug#696632: initramfs-tools: Patches to clean up /run handling for jessie
Index(es):
- Date
- Thread