Bug#754354: Something holds dentry-related mutex forever in Wheezy amd64 kernel
It is possible that I'm seeing the same problem. Our AMD Opteron 4386 (16
cores) machine is also getting stuck with lots of hung tasks.
Although it responds to ping, and even a KVM virtual machine running on it
appears to continue working correctly, the host itself is locked up. This
happens once a week - probably when the machine is under the most direct
CPU load and NFS load.
Once the machine is in this state I can type in a username at the login
prompt but no password prompt ever appears.
I forced a crashdump and it contained hundreds of tasks with backtraces
involving a mutex_lock in walk_component or nfsd_lookup_dentry which look
similar to Alexander's:
PID: 499 TASK: ffff880490a29080 CPU: 11 COMMAND: "nrpe"
#0 [ffff880454e099a8] __schedule at ffffffff8134f195
#1 [ffff880454e09a30] __mutex_lock_common.isra.5 at ffffffff8134fb74
#2 [ffff880454e09aa0] mutex_lock at ffffffff8134fa62
#3 [ffff880454e09ac0] walk_component at ffffffff81103868
#4 [ffff880454e09b30] link_path_walk at ffffffff811040c1
#5 [ffff880454e09bc0] path_openat at ffffffff8110611d
#6 [ffff880454e09c50] do_filp_open at ffffffff8110646d
#7 [ffff880454e09d20] open_exec at ffffffff810fed80
#8 [ffff880454e09d40] load_elf_binary at ffffffff81135939
#9 [ffff880454e09e50] search_binary_handler at ffffffff810ff7fd
#10 [ffff880454e09ea0] do_execve_common.isra.24 at ffffffff81100551
#11 [ffff880454e09f10] sys_execve at ffffffff81014dd2
#12 [ffff880454e09f50] stub_execve at ffffffff813559ec
RIP: 00007fcc8991ca87 RSP: 00007fffe8b91ef8 RFLAGS: 00000246
RAX: 000000000000003b RBX: 0000000000000003 RCX: ffffffffffffffff
RDX: 000000000164d180 RSI: 00007fffe8b91f10 RDI: 00007fcc899bc3ad
RBP: 0000000000000003 R8: 0000000000000000 R9: 00000000000001f2
R10: 00007fcc8a88f9d0 R11: 0000000000000246 R12: 00007fffe8b91f10
R13: 0000000000000400 R14: 0000000000000001 R15: 00007fffe8b91f10
ORIG_RAX: 000000000000003b CS: 0033 SS: 002b
and:
PID: 4087 TASK: ffff88040ea63840 CPU: 2 COMMAND: "nfsd"
#0 [ffff8804034b9c00] __schedule at ffffffff8134f195
#1 [ffff8804034b9c88] __mutex_lock_common.isra.5 at ffffffff8134fb74
#2 [ffff8804034b9cf8] mutex_lock at ffffffff8134fa62
#3 [ffff8804034b9d18] fh_lock_nested.isra.6 at ffffffffa043d63c [nfsd]
#4 [ffff8804034b9d28] nfsd_lookup_dentry at ffffffffa043df1a [nfsd]
#5 [ffff8804034b9d98] nfsd4_secinfo.part.15 at ffffffffa0447692 [nfsd]
#6 [ffff8804034b9dc8] nfsd4_proc_compound at ffffffffa04468d6 [nfsd]
#7 [ffff8804034b9e18] nfsd_dispatch at ffffffffa043a7cd [nfsd]
#8 [ffff8804034b9e48] svc_process_common at ffffffffa0336c3f [sunrpc]
#9 [ffff8804034b9eb8] svc_process at ffffffffa0337050 [sunrpc]
#10 [ffff8804034b9ed8] nfsd at ffffffffa043a0e3 [nfsd]
#11 [ffff8804034b9ef8] kthread at ffffffff8105f701
#12 [ffff8804034b9f48] kernel_thread_helper at ffffffff813576f4
and:
PID: 5013 TASK: ffff880805c8b180 CPU: 8 COMMAND: "getty"
#0 [ffff88080cb8b9a8] __schedule at ffffffff8134f195
#1 [ffff88080cb8ba30] __mutex_lock_common.isra.5 at ffffffff8134fb74
#2 [ffff88080cb8baa0] mutex_lock at ffffffff8134fa62
#3 [ffff88080cb8bac0] walk_component at ffffffff81103868
#4 [ffff88080cb8bb30] link_path_walk at ffffffff811040c1
#5 [ffff88080cb8bbc0] path_openat at ffffffff8110611d
#6 [ffff88080cb8bc50] do_filp_open at ffffffff8110646d
#7 [ffff88080cb8bd20] open_exec at ffffffff810fed80
#8 [ffff88080cb8bd40] load_elf_binary at ffffffff81135939
#9 [ffff88080cb8be50] search_binary_handler at ffffffff810ff7fd
#10 [ffff88080cb8bea0] do_execve_common.isra.24 at ffffffff81100551
#11 [ffff88080cb8bf10] sys_execve at ffffffff81014dd2
#12 [ffff88080cb8bf50] stub_execve at ffffffff813559ec
RIP: 00007f0d1ed74a87 RSP: 00007fffab157528 RFLAGS: 00000206
RAX: 000000000000003b RBX: 0000000000000000 RCX: ffffffffffffffff
RDX: 00007fffab159ee8 RSI: 00007fffab157600 RDI: 0000000000405d7c
RBP: 0000000000000003 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 00000000006075a0
R13: 00000000011da750 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 000000000000003b CS: 0033 SS: 002b
ii linux-image-amd64 3.2+46
ii nfs-kernel-server 1:1.2.6-4
Mike.
Reply to: