[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#754354: Something holds dentry-related mutex forever in Wheezy amd64 kernel



It is possible that I'm seeing the same problem. Our AMD Opteron 4386 (16
cores) machine is also getting stuck with lots of hung tasks.

Although it responds to ping, and even a KVM virtual machine running on it
appears to continue working correctly, the host itself is locked up. This
happens once a week - probably when the machine is under the most direct
CPU load and NFS load.

Once the machine is in this state I can type in a username at the login
prompt but no password prompt ever appears.

I forced a crashdump and it contained hundreds of tasks with backtraces
involving a mutex_lock in walk_component or nfsd_lookup_dentry which look
similar to Alexander's:

PID: 499    TASK: ffff880490a29080  CPU: 11  COMMAND: "nrpe"
 #0 [ffff880454e099a8] __schedule at ffffffff8134f195
 #1 [ffff880454e09a30] __mutex_lock_common.isra.5 at ffffffff8134fb74
 #2 [ffff880454e09aa0] mutex_lock at ffffffff8134fa62
 #3 [ffff880454e09ac0] walk_component at ffffffff81103868
 #4 [ffff880454e09b30] link_path_walk at ffffffff811040c1
 #5 [ffff880454e09bc0] path_openat at ffffffff8110611d
 #6 [ffff880454e09c50] do_filp_open at ffffffff8110646d
 #7 [ffff880454e09d20] open_exec at ffffffff810fed80
 #8 [ffff880454e09d40] load_elf_binary at ffffffff81135939
 #9 [ffff880454e09e50] search_binary_handler at ffffffff810ff7fd
#10 [ffff880454e09ea0] do_execve_common.isra.24 at ffffffff81100551
#11 [ffff880454e09f10] sys_execve at ffffffff81014dd2
#12 [ffff880454e09f50] stub_execve at ffffffff813559ec
    RIP: 00007fcc8991ca87  RSP: 00007fffe8b91ef8  RFLAGS: 00000246
    RAX: 000000000000003b  RBX: 0000000000000003  RCX: ffffffffffffffff
    RDX: 000000000164d180  RSI: 00007fffe8b91f10  RDI: 00007fcc899bc3ad
    RBP: 0000000000000003   R8: 0000000000000000   R9: 00000000000001f2
    R10: 00007fcc8a88f9d0  R11: 0000000000000246  R12: 00007fffe8b91f10
    R13: 0000000000000400  R14: 0000000000000001  R15: 00007fffe8b91f10
    ORIG_RAX: 000000000000003b  CS: 0033  SS: 002b

and:

PID: 4087   TASK: ffff88040ea63840  CPU: 2   COMMAND: "nfsd"
 #0 [ffff8804034b9c00] __schedule at ffffffff8134f195
 #1 [ffff8804034b9c88] __mutex_lock_common.isra.5 at ffffffff8134fb74
 #2 [ffff8804034b9cf8] mutex_lock at ffffffff8134fa62
 #3 [ffff8804034b9d18] fh_lock_nested.isra.6 at ffffffffa043d63c [nfsd]
 #4 [ffff8804034b9d28] nfsd_lookup_dentry at ffffffffa043df1a [nfsd]
 #5 [ffff8804034b9d98] nfsd4_secinfo.part.15 at ffffffffa0447692 [nfsd]
 #6 [ffff8804034b9dc8] nfsd4_proc_compound at ffffffffa04468d6 [nfsd]
 #7 [ffff8804034b9e18] nfsd_dispatch at ffffffffa043a7cd [nfsd]
 #8 [ffff8804034b9e48] svc_process_common at ffffffffa0336c3f [sunrpc]
 #9 [ffff8804034b9eb8] svc_process at ffffffffa0337050 [sunrpc]
#10 [ffff8804034b9ed8] nfsd at ffffffffa043a0e3 [nfsd]
#11 [ffff8804034b9ef8] kthread at ffffffff8105f701
#12 [ffff8804034b9f48] kernel_thread_helper at ffffffff813576f4

and:

PID: 5013   TASK: ffff880805c8b180  CPU: 8   COMMAND: "getty"
 #0 [ffff88080cb8b9a8] __schedule at ffffffff8134f195
 #1 [ffff88080cb8ba30] __mutex_lock_common.isra.5 at ffffffff8134fb74
 #2 [ffff88080cb8baa0] mutex_lock at ffffffff8134fa62
 #3 [ffff88080cb8bac0] walk_component at ffffffff81103868
 #4 [ffff88080cb8bb30] link_path_walk at ffffffff811040c1
 #5 [ffff88080cb8bbc0] path_openat at ffffffff8110611d
 #6 [ffff88080cb8bc50] do_filp_open at ffffffff8110646d
 #7 [ffff88080cb8bd20] open_exec at ffffffff810fed80
 #8 [ffff88080cb8bd40] load_elf_binary at ffffffff81135939
 #9 [ffff88080cb8be50] search_binary_handler at ffffffff810ff7fd
#10 [ffff88080cb8bea0] do_execve_common.isra.24 at ffffffff81100551
#11 [ffff88080cb8bf10] sys_execve at ffffffff81014dd2
#12 [ffff88080cb8bf50] stub_execve at ffffffff813559ec
    RIP: 00007f0d1ed74a87  RSP: 00007fffab157528  RFLAGS: 00000206
    RAX: 000000000000003b  RBX: 0000000000000000  RCX: ffffffffffffffff
    RDX: 00007fffab159ee8  RSI: 00007fffab157600  RDI: 0000000000405d7c
    RBP: 0000000000000003   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000206  R12: 00000000006075a0
    R13: 00000000011da750  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 000000000000003b  CS: 0033  SS: 002b

ii  linux-image-amd64                            3.2+46
ii  nfs-kernel-server                            1:1.2.6-4

Mike.


Reply to: