On Sunday 28 September 2014 at 15:01:21 +0100, Ben Hutchings wrote:
> On Sat, 2014-09-27 at 19:41 +0100, Mike Crowe wrote:
> > I compiled my own version of the Debian 3.2.60-1+deb7u3 kernel with
> > CONFIG_LOCKDEP and panic on hung task enabled.
> >
> > >From the crash dump:
> >
> > [25202.156175] INFO: task nfsd:3247 blocked for more than 900 seconds.
> > [25202.162565] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [25202.170432] nfsd D ffff88080aa0eca8 0 3247 2 0x00000000
> > [25202.170444] ffff88080a8e19f0 0000000000000046 0000000000000006 ffff880800000000
> > [25202.170458] ffff88080aa0e9c0 ffff88080a8e1fd8 ffff88080a8e1fd8 00000000001d4040
> > [25202.170472] ffff88040e9926c0 ffff88080aa0e9c0 ffffffff8138d6da 00000001a04c47dd
> > [25202.170488] Call Trace:
> > [25202.170504] [<ffffffff8138d6da>] ? __mutex_lock_common+0x236/0x379
> > [25202.170531] [<ffffffffa04c47dd>] ? fh_lock_nested+0x4d/0x61 [nfsd]
> > [25202.170542] [<ffffffff8138cda2>] schedule+0x55/0x57
> > [25202.170552] [<ffffffff8138d6e7>] __mutex_lock_common+0x243/0x379
> > [25202.170569] [<ffffffffa04c47dd>] ? fh_lock_nested+0x4d/0x61 [nfsd]
> > [25202.170581] [<ffffffff8138d8dc>] mutex_lock_nested+0x2a/0x31
> > [25202.170598] [<ffffffffa04c47dd>] fh_lock_nested+0x4d/0x61 [nfsd]
> > [25202.170610] [<ffffffff810140f5>] ? sched_clock+0x9/0xd
> > [25202.170626] [<ffffffffa04c50fe>] nfsd_lookup_dentry+0x196/0x227 [nfsd]
> > [25202.170646] [<ffffffffa04cef7f>] nfsd4_secinfo.part.15+0x26/0x9e [nfsd]
> > [25202.170666] [<ffffffffa04cf044>] nfsd4_secinfo+0x4d/0x5b [nfsd]
> > [25202.170688] [<ffffffffa04ce105>] nfsd4_proc_compound+0x265/0x43e [nfsd]
> > [25202.170703] [<ffffffffa04c181d>] nfsd_dispatch+0xe2/0x1c8 [nfsd]
> > [25202.170734] [<ffffffffa03759c1>] svc_process_common+0x2cf/0x4d0 [sunrpc]
> > [25202.170759] [<ffffffffa0375de0>] svc_process+0x118/0x136 [sunrpc]
> > [25202.170773] [<ffffffffa04c10eb>] nfsd+0xeb/0x131 [nfsd]
> > [25202.170796] [<ffffffffa04c1000>] ? 0xffffffffa04c0fff
> > [25202.170806] [<ffffffff81065c75>] kthread+0xa3/0xab
> > [25202.170815] [<ffffffff81396584>] kernel_thread_helper+0x4/0x10
> > [25202.170823] [<ffffffff8138f074>] ? retint_restore_args+0x13/0x13
> > [25202.170830] [<ffffffff81065bd2>] ? __init_kthread_worker+0x53/0x53
> > [25202.170837] [<ffffffff81396580>] ? gs_change+0x13/0x13
> > [25202.170842] 1 lock held by nfsd/3247:
> > [25202.170845] #0: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: [<ffffffffa04c47dd>] fh_lock_nested+0x4d/0x61 [nfsd]
> > [25202.170870] Kernel panic - not syncing: hung_task: blocked tasks
[snip]
> nfsd is trying to lock two objects in the same class: specifically, it
> locks a file handle and then the file handle for the file's parent.
> It's generally safe to do this so long as they're always taken in that
> order. lockdep should complain (much more verbosely) if this is not
> done consistently.
That makes sense. So is there any clue as to why it's blocking inside the
second mutex_lock_nested?
> I'm afraid this doesn't explain what's going wrong. But if there are
> any more messages from lockdep further up the log (like, 15 minutes
> earlier), they might do.
Unfortunately not, the previous line in the log is the last message from boot time:
[ 38.624072] vnet0: no IPv6 routers present
Is there a way I can persuade crash(8) to tell me which process currently
has the lock in question?
Do you have any advice as to any more debug stuff I should try turning on
when compiling the kernel?
Thanks for your help.
Mike.
Attachment:
signature.asc
Description: Digital signature