[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1087981: marked as done (linux-image-6.1.0-27-amd64: detected stalls in kernel log, system very slow on IO (regression))



Your message dated Thu, 24 Apr 2025 15:00:44 +0200
with message-id <aAo1_HL1vyNEUrMD@eldamar.lan>
and subject line Re: Bug#1087981: linux-image-6.1.0-27-amd64: detected stalls in kernel log, system very slow on IO (regression)
has caused the Debian Bug report #1087981,
regarding linux-image-6.1.0-27-amd64: detected stalls in kernel log, system very slow on IO (regression)
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
1087981: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087981
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: src:linux
Version: 6.1.115-1
Severity: important
Tags: upstream

Hi,

I'm seeing regularly kernel logs of the form below, combined with the system
feeling terribly slow. This is a regression as I didn't have these problems
with linux-image-6.1.0-26-amd64. The issue occurs on a VPS.

The problems seem to be fixed with the linux-image-6.1.0-28-amd64-unsigned
6.1.118-1~test1 from the repo at
https://people.debian.org/~carnil/tmp/linux/1086447/ (i.e., with the affected
version, everything is terribly slow from the beginning while the previous
version and test version don't show this symptom + nothing related yet in the
kernel log).

I thought about adding my report to #1087602 as it is also a regression from
linux-image-6.1.0-26-amd64 on virtualized hardware and sounds related, but I
decided to file as a different bug because the backtrace look different.

thanks
Willi


[462022.185732] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[462022.186313] rcu:    0-...0: (1 GPs behind) idle=313c/1/0x4000000000000000 softirq=1652560/1652574 fqs=976
[462022.186876]         (detected by 1, t=5255 jiffies, g=2935025, q=288 ncpus=2)
[462022.186880] Sending NMI from CPU 1 to CPUs 0:
[462037.911283] NMI backtrace for cpu 0
[462037.911288] CPU: 0 PID: 12664 Comm: systemd-journal Tainted: G             L     6.1.0-27-amd64 #1  Debian 6.1.115-1
[462037.911292] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[462037.911293] RIP: 0010:asm_sysvec_reschedule_ipi+0x0/0x20
[462037.911301] Code: e9 15 07 00 00 0f 1f 44 00 00 0f 01 ca fc 6a ff e8 d5 05 00 00 48 89 c4 48 89 e7 e8 fa f9 e2 ff e9 f5 06 00 00 0f 1f 44 00 00 <0f> 01 ca fc 6a ff e8 b5 05 00 00 48 89 c4 48 89 e7 e8 fa 1b e3 ff
[462037.911302] RSP: 0000:fffffe0000002fd8 EFLAGS: 00000002
[462037.911304] RAX: 0000000000000001 RBX: 0000000000fbafb0 RCX: 0000000000fbafb0
[462037.911306] RDX: 0000000000000006 RSI: 00007efd91dbafb0 RDI: 000055c938078790
[462037.911306] RBP: 000055c938078790 R08: 000055c938078970 R09: 000055c9380787c8
[462037.911307] R10: 0000000000000002 R11: 00007efd93d6f288 R12: 00007ffecfbc6918
[462037.911308] R13: 00007ffecfbc6890 R14: 0000000000000006 R15: 00007efd91dbafb0
[462037.911311] FS:  00007efd9573b5c0(0000) GS:ffff9201f9c00000(0000) knlGS:ffff9201f9c00000
[462037.911312] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[462037.911313] CR2: 00007efd93d6f1f4 CR3: 00000001022b8000 CR4: 00000000003506f0
[462037.911316] Call Trace:
[462060.884093] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[462060.884626] rcu:    1-...!: (1 ticks this GP) idle=065c/1/0x4000000000000000 softirq=1599519/1599519 fqs=18
[462060.885310]         (detected by 0, t=5742 jiffies, g=2935029, q=224 ncpus=2)
[462060.885314] Sending NMI from CPU 0 to CPUs 1:
[462073.753390] NMI backtrace for cpu 1
[462073.753396] CPU: 1 PID: 85627 Comm: kworker/1:1 Tainted: G             L     6.1.0-27-amd64 #1  Debian 6.1.115-1
[462073.753401] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[462073.753403] Workqueue: events drm_fb_helper_damage_work [drm_kms_helper]
[462073.753421] RIP: 0010:asm_sysvec_apic_timer_interrupt+0x0/0x20
[462073.753427] Code: e9 55 07 00 00 0f 1f 44 00 00 0f 01 ca fc 6a ff e8 15 06 00 00 48 89 c4 48 89 e7 e8 ba 1f e3 ff e9 35 07 00 00 0f 1f 44 00 00 <0f> 01 ca fc 6a ff e8 f5 05 00 00 48 89 c4 48 89 e7 e8 6a 1e e3 ff
[462073.753429] RSP: 0018:ffffbc1a46d5fdc8 EFLAGS: 00000016
[462073.753430] RAX: 0000000000000000 RBX: 0000000000001400 RCX: 0000000000000200
[462073.753431] RDX: 0000000000000000 RSI: ffffbc1a40829000 RDI: ffffbc1a4240c000
[462073.753432] RBP: ffffbc1a4240b400 R08: ffff920162cfd848 R09: ffff920147dc1074
[462073.753433] R10: 0000000000000007 R11: 0000000000000002 R12: ffffbc1a40828400
[462073.753433] R13: ffffbc1a40828400 R14: ffffbc1a4240b400 R15: ffff920144f7bcd0
[462073.753437] FS:  0000000000000000(0000) GS:ffff9201f9d00000(0000) knlGS:0000000000000000
[462073.753438] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[462073.753439] CR2: 00007f85dd5ab0d8 CR3: 0000000122694000 CR4: 00000000003506e0
[462073.753442] Call Trace:
[462073.753444]  <TASK>
[462073.753445] RIP: 0010:memcpy_toio+0x6d/0xb0
[462073.753448] Code: 5c e9 b7 ce 88 00 48 85 db 74 f2 40 f6 c5 01 75 45 48 83 fb 01 76 06 40 f6 c5 02 75 25 48 89 d9 48 89 ef 4c 89 e6 48 c1 e9 02 <f3> a5 f6 c3 02 74 02 66 a5 f6 c3 01 74 01 a4 5b 5d 41 5c e9 7b ce
[462073.753449] RSP: 0018:ffffbc1a46d5fdf8 EFLAGS: 00000216
[462073.753452]  ? drm_fb_helper_damage_work+0x2df/0x3a0 [drm_kms_helper]
[462073.753463]  ? process_one_work+0x1c7/0x380
[462073.753467]  ? worker_thread+0x4d/0x380
[462073.753469]  ? rescuer_thread+0x3a0/0x3a0
[462073.753471]  ? kthread+0xda/0x100
[462073.753473]  ? kthread_complete_and_exit+0x20/0x20
[462073.753475]  ? ret_from_fork+0x22/0x30
[462073.753479]  </TASK>
[462073.753757] rcu: rcu_preempt kthread timer wakeup didn't happen for 5649 jiffies! g2935029 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[462073.754323] rcu:    Possible timer handling issue on cpu=0 timer-softirq=4828143
[462073.754819] rcu: rcu_preempt kthread starved for 5650 jiffies! g2935029 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[462073.755256] rcu:    Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[462073.755759] rcu: RCU grace-period kthread stack dump:
[462073.756240] task:rcu_preempt     state:I stack:0     pid:15    ppid:2      flags:0x00004000
[462073.756245] Call Trace:
[462073.756247]  <TASK>
[462073.756251]  __schedule+0x34d/0x9e0

--- End Message ---
--- Begin Message ---
Hi Willi

On Sat, Mar 22, 2025 at 08:07:24PM +0100, Salvatore Bonaccorso wrote:
> Hi Willi,
> 
> On Sat, Mar 22, 2025 at 07:47:14PM +0100, Willi Mann wrote:
> > Hi Salvatore,
> > 
> > sorry, I somehow missed your mail.
> > 
> > Am 21.03.25 um 21:07 schrieb Salvatore Bonaccorso:
> > 
> > > > While we had no followup after your last email, I wonder now if you
> > > > still experience the issues with the most recent 6.1.y kernel
> > > > available in Debian (6.1.128-1).
> > > > 
> > > > Can you report back? Can you test on that machine as well newer
> > > > kernels ideally from unstable (6.12.16-1) or experimental
> > > > (6.13.4-1~exp1)?
> > > 
> > > Any news here?
> > 
> > With 6.1.128-1, there are still backtraces, though I don't experience any
> > slowness. I could only retrieve them from dmesg output as the kernel
> > logfiles containing them are already rotated. Also, I removed noise
> > unrelated to to the crash. See attachment.
> > 
> > I'll install a newer kernel tomorrow and let you know the outcome.
> 
> Ack thanks.
> 
> Btw, in a earlier log which was more complete from the boot on, I
> notice
> 
> [...]
> [ 2866.062346] sh (1675): drop_caches: 3
> [ 6466.150805] sh (2315): drop_caches: 3
> [10066.236130] sh (2955): drop_caches: 3
> [13666.343713] sh (3732): drop_caches: 3
> [17266.432101] sh (4393): drop_caches: 3
> [20866.523958] sh (4975): drop_caches: 3
> [24466.622511] sh (5717): drop_caches: 3
> [26006.697987] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [26016.962347] rcu:     1-...0: (18 ticks this GP) idle=a5ac/1/0x4000000000000000 softirq=94076/94077 fqs=908
> [26016.962408]  (detected by 0, t=5252 jiffies, g=164937, q=845 ncpus=2)
> [26016.962414] Sending NMI from CPU 0 to CPUs 1:
> [26024.942628] NMI backtrace for cpu 1
> [26024.942634] CPU: 1 PID: 4882 Comm: kworker/u4:2 Tainted: G            E      6.1.0-28-amd64 #1  Debian
> +6.1.118-1~test1
> [...]
> 
> before the actual stalls begin. what is doing that and droppying
> caches?
> 
> thans for your time doing the experiments.

Closing now this bugreport. In case you find time for doing the above
further experiments, please *do* reopen the bug and remove the
moreinfo tag.

Regards,
Salvatore

--- End Message ---

Reply to: