Re: rcu_sched self-detected stall on CPU

Hello everyone,

I am not sure if the first email actually made it to the list since I don't see it in https://lists.debian.org/debian-kernel/2015/10/threads.html. Now I am a subscriber and have some more information to report. Please see the original email below.

Yesterday we tried kernel 4.2.3 built from source and got the same problem with a slightly different stacktrace. Please see below:

[15466.910748] INFO: rcu_sched self-detected stall on CPU { 6} (t=5251 jiffies g=1839737 c=1839736 q=212886)
[15466.910818] Task dump for CPU 6:
[15466.910821] run-one R running task 0 28191 28185 0x00000008
[15466.910824] 0000000000000007 ffffffff81854e80 ffffffff810c7180 00000000001c1279
[15466.910828] ffff88407fc16580 ffffffff81854e80 0000000000000000 ffffffff81906b80
[15466.910830] ffffffff810ca2c1 ffffffff8101cb05 0000000000000096 0000000000000000
[15466.910833] Call Trace:
[15466.910835] <IRQ> [<ffffffff810c7180>] ? rcu_dump_cpu_stacks+0x80/0xb0
[15466.910847] [<ffffffff810ca2c1>] ? rcu_check_callbacks+0x421/0x6e0
[15466.910852] [<ffffffff8101cb05>] ? sched_clock+0x5/0x10
[15466.910857] [<ffffffff810dd520>] ? tick_sched_do_timer+0x40/0x40
[15466.910861] [<ffffffff810cee16>] ? update_process_times+0x36/0x60
[15466.910863] [<ffffffff810dcf54>] ? tick_sched_handle.isra.15+0x24/0x60
[15466.910866] [<ffffffff810dd55b>] ? tick_sched_timer+0x3b/0x70
[15466.910869] [<ffffffff810cf93c>] ? __hrtimer_run_queues+0xdc/0x210
[15466.910871] [<ffffffff8101c5b5>] ? read_tsc+0x5/0x10
[15466.910874] [<ffffffff810cfd5a>] ? hrtimer_interrupt+0x9a/0x190
[15466.910881] [<ffffffff815534b9>] ? smp_apic_timer_interrupt+0x39/0x50
[15466.910884] [<ffffffff8155169b>] ? apic_timer_interrupt+0x6b/0x70
[15466.910886] <EOI> [<ffffffff810afb4a>] ? native_queued_spin_lock_slowpath+0x11a/0x180
[15466.910894] [<ffffffff812cf47c>] ? lockref_put_or_lock+0x3c/0x70
[15466.910896] [<ffffffff8155046d>] ? _raw_spin_lock+0x1d/0x20
[15466.910901] [<ffffffff811ced8b>] ? shrink_dentry_list+0x24b/0x2d0
[15466.910904] [<ffffffff811cf380>] ? d_invalidate+0xb0/0xf0
[15466.910907] [<ffffffff811c2c6e>] ? lookup_fast+0x22e/0x2c0
[15466.910910] [<ffffffff811c43c6>] ? walk_component+0x36/0x440
[15466.910912] [<ffffffff811c4940>] ? link_path_walk+0x170/0x530
[15466.910914] [<ffffffff811c3bca>] ? path_init+0x1ea/0x3c0
[15466.910917] [<ffffffff811c5237>] ? path_lookupat+0x77/0x100
[15466.910919] [<ffffffff811c7705>] ? filename_lookup+0x95/0x150
[15466.910921] [<ffffffff811c738f>] ? getname_flags+0x6f/0x1e0
[15466.910927] [<ffffffff811b6f8f>] ? SyS_access+0x9f/0x1e0
[15466.910930] [<ffffffff81550832>] ? entry_SYSCALL_64_fastpath+0x16/0x75

Any ideas what might be happening here ? Anything else I can do in order to add more debugging/value to the investigation?

Thank you,

Rumen Telbizov

On Mon, Oct 5, 2015 at 3:55 PM, Rumen Telbizov <telbizov@gmail.com> wrote:

Dear list,

We are experiencing a problem with a fleet of about 12 SuperMicro machines running Debian Jessie. Searching online got me nowhere so I am hoping you can give us a hand.

The problem might be NFS related and manifests itself by seemingly "pinning" our worker processes (20 of them on each machine) to 100% system time CPU for a good 10-30 seconds period of time, sometimes locking them forever. The kernel produces the below stacktrace. During that period the userland processes don't actually seem to be doing anything although all the CPU usage is accounted on them:

Oct 5 21:09:49 brain2 kernel: [ 8937.794900] INFO: rcu_sched self-detected stall on CPU { 0} (t=5250 jiffies g=884703 c=884702 q=128580)
Oct 5 21:09:49 brain2 kernel: [ 8937.794971] sending NMI to all CPUs:
Oct 5 21:09:49 brain2 kernel: [ 8937.794975] NMI backtrace for cpu 0
Oct 5 21:09:49 brain2 kernel: [ 8937.794978] CPU: 0 PID: 13197 Comm: run-one Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u4
Oct 5 21:09:49 brain2 kernel: [ 8937.794979] Hardware name: Silicon Mechanics Rackform R4422.v5/X10DRT-PT, BIOS 1.1 08/03/2015
Oct 5 21:09:49 brain2 kernel: [ 8937.794981] task: ffff88406fbc5530 ti: ffff883fe34dc000 task.ti: ffff883fe34dc000
Oct 5 21:09:49 brain2 kernel: [ 8937.794983] RIP: 0010:[<ffffffff812b5dc9>] [<ffffffff812b5dc9>] __const_udelay+0x9/0x30
Oct 5 21:09:49 brain2 kernel: [ 8937.794989] RSP: 0018:ffff88207fc03df8 EFLAGS: 00000006
Oct 5 21:09:49 brain2 kernel: [ 8937.794990] RAX: 0000000000001000 RBX: 00000000000003e9 RCX: 0000000000000001
Oct 5 21:09:49 brain2 kernel: [ 8937.794992] RDX: 0000000000927d5c RSI: 0000000000000200 RDI: 0000000000068dbc
Oct 5 21:09:49 brain2 kernel: [ 8937.794994] RBP: 0000000000001000 R08: ffffffff818e23e0 R09: 0000000000000000
Oct 5 21:09:49 brain2 kernel: [ 8937.794995] R10: 00000000000004ec R11: ffff88207fc03b96 R12: 0000000000000400
Oct 5 21:09:49 brain2 kernel: [ 8937.794996] R13: 0000000000000096 R14: 0000000000000002 R15: 0000000000000002
Oct 5 21:09:49 brain2 kernel: [ 8937.794998] FS: 00007fb84409a740(0000) GS:ffff88207fc00000(0000) knlGS:0000000000000000
Oct 5 21:09:49 brain2 kernel: [ 8937.795000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 5 21:09:49 brain2 kernel: [ 8937.795002] CR2: 00007f3bc7049000 CR3: 0000001e45a0c000 CR4: 00000000001407f0
Oct 5 21:09:49 brain2 kernel: [ 8937.795003] Stack:
Oct 5 21:09:49 brain2 kernel: [ 8937.795004] ffffffff81044e42 000000000000a0ea ffffffff818e23e0 ffffffff81046859
Oct 5 21:09:49 brain2 kernel: [ 8937.795007] 0000000000000001 ffff88207fc0d660 ffffffff81853680 0000000000000000
Oct 5 21:09:49 brain2 kernel: [ 8937.795010] ffffffff818e2940 000000000001f644 ffffffff81853740 ffffffff81046ab3
Oct 5 21:09:49 brain2 kernel: [ 8937.795012] Call Trace:
Oct 5 21:09:49 brain2 kernel: [ 8937.795014] <IRQ>
Oct 5 21:09:49 brain2 kernel: [ 8937.795015] [<ffffffff81044e42>] ? native_safe_apic_wait_icr_idle+0x22/0x50
Oct 5 21:09:49 brain2 kernel: [ 8937.795022] [<ffffffff81046859>] ? default_send_IPI_mask_sequence_phys+0xd9/0xe0
Oct 5 21:09:49 brain2 kernel: [ 8937.795025] [<ffffffff81046ab3>] ? arch_trigger_all_cpu_backtrace+0xc3/0x140
Oct 5 21:09:49 brain2 kernel: [ 8937.795029] [<ffffffff810c53ca>] ? rcu_check_callbacks+0x3ea/0x630
Oct 5 21:09:49 brain2 kernel: [ 8937.795033] [<ffffffff810cfde0>] ? tick_sched_handle.isra.16+0x60/0x60
Oct 5 21:09:49 brain2 kernel: [ 8937.795037] [<ffffffff81074ab0>] ? update_process_times+0x40/0x70
Oct 5 21:09:49 brain2 kernel: [ 8937.795040] [<ffffffff810cfda0>] ? tick_sched_handle.isra.16+0x20/0x60
Oct 5 21:09:49 brain2 kernel: [ 8937.795042] [<ffffffff810cfe1c>] ? tick_sched_timer+0x3c/0x60
Oct 5 21:09:49 brain2 kernel: [ 8937.795045] [<ffffffff8108b067>] ? __run_hrtimer+0x67/0x1c0
Oct 5 21:09:49 brain2 kernel: [ 8937.795047] [<ffffffff8108b419>] ? hrtimer_interrupt+0xe9/0x220
Oct 5 21:09:49 brain2 kernel: [ 8937.795051] [<ffffffff8151452b>] ? smp_apic_timer_interrupt+0x3b/0x60
Oct 5 21:09:49 brain2 kernel: [ 8937.795055] [<ffffffff815125bd>] ? apic_timer_interrupt+0x6d/0x80
Oct 5 21:09:49 brain2 kernel: [ 8937.795056] <EOI>
Oct 5 21:09:49 brain2 kernel: [ 8937.795057] [<ffffffff811bcd60>] ? dentry_lru_isolate_shrink+0x40/0x40
Oct 5 21:09:49 brain2 kernel: [ 8937.795063] [<ffffffff81511188>] ? _raw_spin_lock+0x28/0x30
Oct 5 21:09:49 brain2 kernel: [ 8937.795066] [<ffffffff812b75e8>] ? lockref_put_or_lock+0x48/0x80
Oct 5 21:09:49 brain2 kernel: [ 8937.795069] [<ffffffff811bdf14>] ? shrink_dentry_list+0x134/0x240
Oct 5 21:09:49 brain2 kernel: [ 8937.795071] [<ffffffff811be838>] ? check_submounts_and_drop+0x68/0x90
Oct 5 21:09:49 brain2 kernel: [ 8937.795078] [<ffffffffa06b9ab6>] ? nfs_lookup_revalidate+0xd6/0x560 [nfs]
Oct 5 21:09:49 brain2 kernel: [ 8937.795082] [<ffffffff811b1f7e>] ? lookup_fast+0x25e/0x2b0
Oct 5 21:09:49 brain2 kernel: [ 8937.795086] [<ffffffffa06b80b1>] ? nfs_permission+0xd1/0x180 [nfs]
Oct 5 21:09:49 brain2 kernel: [ 8937.795089] [<ffffffff811b3009>] ? link_path_walk+0x1d9/0x8c0
Oct 5 21:09:49 brain2 kernel: [ 8937.795093] [<ffffffffa06bc621>] ? nfs_find_actor+0x51/0x80 [nfs]
Oct 5 21:09:49 brain2 kernel: [ 8937.795096] [<ffffffff811c12f9>] ? find_inode+0x59/0x90
Oct 5 21:09:49 brain2 kernel: [ 8937.795099] [<ffffffff811b374b>] ? path_lookupat+0x5b/0x7b0
Oct 5 21:09:49 brain2 kernel: [ 8937.795101] [<ffffffff811bdc1f>] ? dput+0x1f/0x170
Oct 5 21:09:49 brain2 kernel: [ 8937.795104] [<ffffffff811b3ec6>] ? filename_lookup+0x26/0xc0
Oct 5 21:09:49 brain2 kernel: [ 8937.795106] [<ffffffff811b7f84>] ? user_path_at_empty+0x54/0x90
Oct 5 21:09:49 brain2 kernel: [ 8937.795111] [<ffffffff811a68ce>] ? SyS_access+0x9e/0x210
Oct 5 21:09:49 brain2 kernel: [ 8937.795113] [<ffffffff8151164d>] ? system_call_fast_compare_end+0x10/0x15
Oct 5 21:09:49 brain2 kernel: [ 8937.795115] Code: 00 00 48 ff c8 75 fb 48 ff c8 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 8b 05 fc 42 5e 00 ff e0 66 90 65 48 8b 14 25 a0 26 01 00 <48> 8d 0c 12 48 c1 e2 06 48 8d 04 bd 00 00 00 00 48 29 ca f7 e2

The above is reported for all CPUs.

Setup:
A cluster of 5 machines. First machine exports a drive over NFSv4 to the rest acting as clients. Processing takes place on the every machine (including the server) and output data is written back on the NFS shared drive. Running kernel 3.16.7-ckt11-1+deb8u4, also tried the 4.1.6 backport - the same problem occurs there too.

Hardware:
X10DRT-PT, 256GB RAM, 12 x E5-2620, 2xS3710s SSDs mdraid1. Latest BIOS firmware.

I was wondering if _raw_spin_lock in the stack trace and the fact that the CPUs hit 100% might be related?

Any help is greatly appreciated.

Thank you,
--
Rumen Telbizov
Unix Systems Administrator

Rumen Telbizov

Unix Systems Administrator