[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#979740: marked as done (Processes blocked in io_schedule when using disks behind JMB575 JBOD)



Your message dated Thu, 20 Feb 2025 13:24:44 +0100 (CET)
with message-id <20250220122444.12020BE2EE7@eldamar.lan>
and subject line Closing this bug (BTS maintenance for src:linux bugs)
has caused the Debian Bug report #979740,
regarding Processes blocked in io_schedule when using disks behind JMB575 JBOD
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
979740: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=979740
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: linux-image-armmp
Version: 5.9.15-1

Dear Maintainer,
since upgrading to the 5.9.0-x-armmp armhf kernels I am seeing
processes getting getting randomly stuck in D-state.
I observed this issue with 5.9.0-1-armmp and 5.9.0-4-armmp.
I did not observe the issue with 5.8.0-3-armmp or any previous kernel.

Caveat: the system is running a debian kernel, but userspace is ubuntu focal. I thought you might want to know anyway, since it looks like a kernel problem.

--

Hardware:
Three HDDs connected via JMB575 SataIII port multiplier to i.MX6q's SataII port. Probe log: ata1.15: Port Multiplier 1.2, 0x197b:0x5755 r0, 5 ports, feat 0x5/0xf

Steps to reproduce: Start a bigger filecopy operation from one disk to another. I can reproduce this problem copying multi-GB files either via cp or rsync. Copy gets blocked in D-state in few hours and a cpu core is filled by io-wait. I've seen rsync getting blocked both reading the source and writing destination.

Details:
As I said, JMB575 PMP is connected to the i.MX6q on-chip SATA controller. All it's ports get configured "SATA link up 3.0 Gbps (SStatus 123 SControl 320)" There are three HDDs connected to it - I am going to call them sda, sdb, sdc. Disks "sda" and "sdb" are partitioned and used for regular ext4 filesystems. Disk "sdc" is backing a dm-integrity device. I am going to call it "dm-sdc".
DM device "dm-sdc" is used for a ext4 filesystem.


Generally the blocked processes arrive via io_schedule or io_schedule_timeout: [<c0c9c5f0>] (io_schedule_timeout) from [<c0c9e20c>] (wait_for_completion_io+0xb4/0x134)
 [<c0c9d05c>] (schedule) from [<c0c9d1d8>] (io_schedule+0x50/0x70)
[<c0c9d188>] (io_schedule) from [<bf640a34>] (dm_integrity_map_continue+0xb28/0xbc0 [dm_integrity]) [<c0c9d188>] (io_schedule) from [<c04eb868>] (wait_on_page_bit_common+0x18c/0x420)
 [<c0c9d188>] (io_schedule) from [<c0704924>] (rq_qos_wait+0x120/0x18c)
 [<c0c9d188>] (io_schedule) from [<c0c9da68>] (bit_wait_io+0x1c/0x68)
[<c0ca11c8>] (schedule_timeout) from [<c0c9c648>] (io_schedule_timeout+0x58/0x78)

I have many full stacktraces, but I am not sure if they are safe to share.
I'd really like to avoid disclosing user data or crypto keys.
I managed to find one that should be safe:
task:cronjob1l.sh state:D stack: 0 pid:28125 ppid: 28124 flags:0x00000000
Backtrace:
[<c0c9c668>] (__schedule) from [<c0c9d0d4>] (schedule+0x78/0xf4)
r10:000000c3 r9:00000002 r8:00000002 r7:c0c9da4c r6:e5a27ae8 r5:e5a26000
 r4:e598be00
[<c0c9d05c>] (schedule) from [<c0c9d1d8>] (io_schedule+0x50/0x70)
 r5:c1304a84 r4:00000000
[<c0c9d188>] (io_schedule) from [<c0c9da68>] (bit_wait_io+0x1c/0x68)
 r5:c1304a84 r4:00000002
[<c0c9da4c>] (bit_wait_io) from [<c0c9d5f8>] (__wait_on_bit+0x70/0xc8)
 r5:c1304a84 r4:e5a27adc
[<c0c9d588>] (__wait_on_bit) from [<c0c9d6f4>] (out_of_line_wait_on_bit+0xa4/0xcc)
 r9:e5a27c30 r8:bf1562a8 r7:00000001 r6:00000000 r5:c5fe43a8 r4:e5a27af4
[<c0c9d650>] (out_of_line_wait_on_bit) from [<c05c6f30>] (__wait_on_buffer+0x40/0x44)
 r4:c00b3b80
[<c05c6ef0>] (__wait_on_buffer) from [<bf109de4>] (ext4_bread+0x124/0x130 [ext4])
 r5:c5fe43a8 r4:c00b3b80
[<bf109cc0>] (ext4_bread [ext4]) from [<bf123f84>] (__ext4_read_dirblock+0x3c/0x45c [ext4])
 r4:e5a27cc0
[<bf123f48>] (__ext4_read_dirblock [ext4]) from [<bf1243f8>] (dx_probe+0x54/0x6d8 [ext4]) r10:000000c3 r9:e5a27c30 r8:00000000 r7:e5a27cc0 r6:d4a5024c r5:c5fe43a8
 r4:e5a27cc0
[<bf1243a4>] (dx_probe [ext4]) from [<bf1262e8>] (__ext4_find_entry+0x418/0x620 [ext4]) r10:000000c3 r9:00000000 r8:00000009 r7:c5fe43a8 r6:d4a5024c r5:e3092800
 r4:e5a27cc0
[<bf125ed0>] (__ext4_find_entry [ext4]) from [<bf126c24>] (ext4_lookup+0xe4/0x2ec [ext4]) r10:000000c3 r9:00000000 r8:00000009 r7:c5fe43a8 r6:c5fe43a8 r5:d4a50220
 r4:e5a27ca0
[<bf126b40>] (ext4_lookup [ext4]) from [<c058d328>] (__lookup_slow+0x98/0x164)
 r9:00000000 r8:00000009 r7:c5fe43a8 r6:e5a27db8 r5:e40df990 r4:d4a50220
[<c058d290>] (__lookup_slow) from [<c05900b8>] (walk_component+0x158/0x1cc)
 r8:00000009 r7:c5fe4430 r6:00000001 r5:e5a27db0 r4:e40df990
[<c058ff60>] (walk_component) from [<c05905a8>] (path_lookupat+0x80/0x1c0)
 r9:00000000 r8:e5a27db0 r7:e5a27ea4 r6:00000001 r5:e5a27db0 r4:00000000
[<c0590528>] (path_lookupat) from [<c0593454>] (filename_lookup+0xb0/0x1ac)
 r7:e5a27ea4 r6:00000001 r5:e5509000 r4:00000001
[<c05933a4>] (filename_lookup) from [<c059369c>] (user_path_at_empty+0x7c/0x98)
 r8:00000800 r7:014c04c4 r6:e5a27ea4 r5:ffffff9c r4:00000001
[<c0593620>] (user_path_at_empty) from [<c0585718>] (vfs_statx+0x7c/0x13c)
 r7:014c04c4 r6:ffffff9c r5:00000001 r4:e5a27ee8
[<c058569c>] (vfs_statx) from [<c0585cf4>] (__do_sys_stat64+0x40/0x80)
r10:000000c3 r9:e5a26000 r8:c03002c4 r7:000000c3 r6:014c051c r5:004c2384
 r4:be94d758
[<c0585cb4>] (__do_sys_stat64) from [<c058610c>] (sys_stat64+0x18/0x1c)
 r4:00000004
[<c05860f4>] (sys_stat64) from [<c03000c0>] (ret_fast_syscall+0x0/0x54)
Exception stack(0xe5a27fa8 to 0xe5a27ff0)
7fa0: 00000004 004c2384 014c04c4 be94d758 be94d758 00000003 7fc0: 00000004 004c2384 014c051c 000000c3 014c0518 00000003 004b0204 004c24ec
7fe0: 000000c3 be94d6e4 b6ee69db b6e6dbe6


What information would be helpful to collect next time this happens?
My attempt to document these mis-behavior follows:

1st occurence:	5.9.0-1-armmp #1 Debian 5.9.1-1
	One recursive copy (cp -ar) reading from dm-sdc, writing to sdb3.
	uninterruptibly sleeping / D-state'd processes:
		cp -ar					# no info
		rest unknown				# no info
	kernel messages:
		one for each hung task
	remediation attempts:
		ls /mnt/dm-sdc				# un-blocked the system
		note: sdc had to spin up because it was parked

2nd occurence:	5.9.0-1-armmp #1 Debian 5.9.1-1
	One recursive copy (cp -ar) reading from dm-sdc, writing to sdb3.
	uninterruptibly sleeping / D-state'd processes:
		cp -ar					# no info
		rest unknown				# no info
	kernel messages:
		NONE
	remediation attempts:
		ls /home/user				# !! entered D-state
		ls /						# no problem
		ls /opt					# no problem
		mplayer --help				# no problem
		mount						# no problem
		/usr/games/caesar				# no problem
		ls /tmp					# no problem
		ls /mnt/sdb3				# !! entered D-state
		ls /mnt/sda2				# !! entered D-state
		ls /mnt/dm-sdc				# un-blocked the system
		note: sdc had to spin up because it was parked

3rd occurence:	5.9.0-1-armmp #1 Debian 5.9.1-1
	One rsync reading from sda2, writing to dm-sdc.
	uninterruptibly sleeping / D-state'd processes:
		rsync(s)					# one or both rsyncs?
		rest unknown				# no info
	kernel messages:
		NONE
	remediation attempts:
		ls /proc					# no problem
		ls /sys					# no problem
		ls /dev					# no problem
		ls /run					# [tmpfs] no problem
		ls /mnt/sdb3				# no problem
		ls /mnt/sda2				# no problem
		ls /home/user				# no problem
		ls /mnt/dm-sdc				# un-blocked the system
		note: sdc had to spin up because it was parked

4th occurence:	5.9.0-1-armmp #1 Debian 5.9.1-1
	One rsync reading from sda2, writing to dm-sdc.
	uninterruptibly sleeping / D-state'd processes:
		kworker/3:13+dm-integrity-writer	# dm-sdc
		rsync						# only one, but which one?
		kworker/u8:0+dm-integrity-wait	# dm-sdc
		kworker/0:0+dm-integrity-commit	# dm-sdc
		jbd2/dm-0-8					# dm-sdc
		/lib/systemd/systemd-journald		# no info
	kernel messages:
		NONE
	remediation attempts:
		ls /mnt/dm-sdc				# no problem
		ls /mnt/dm-sdc/folder			# un-blocked the system
		note: sdc had to spin up because it was parked

<kernel upgraded, system rebooted>

5th occurence:	5.9.0-4-armmp #1 Debian 5.9.11-1
	One rsync -c --dry-run reading from all three disks. No writing!
	uninterruptibly sleeping / D-state'd processes:
		rsync(s)					# no info
		kworker/3:1+dm-integrity-offload	# dm-sdc
		kworker/3:7+dm-integrity-offload	# dm-sdc
		rest unknown				# no futher info
	kernel messages:
		one for each hung task
	remediation attempts:
		ls /mnt/dm-sdc				# !! entered D-state
	uninterruptibly sleeping / D-state'd processes:
		rsync(s)					# no info
		kworker/3:1+dm-integrity-offload	# dm-sdc
		kworker/3:7+dm-integrity-offload	# dm-sdc
		kworker/0:0+dm-integrity-offload	# dm-sdc, !! NEW
		rest unknown				# no info
	remediation attempts:
		ls /home/user				# no problem
		sha256sum /dev/sdc			# !! no help, killed by Ctrl+C
		dd if=/dev/sdc				# !! entered D-state
		smartctl /dev/sdc				# !! entered D-state
		umount /mnt/sdb3				# failed, rsync has a file open
		hdparm -Y /dev/sda			# un-blocked the system
		NOTE: sdc did NOT seem parked this time

6th occurence:	5.9.0-4-armmp #1 Debian 5.9.11-1
One rsync -c --dry-run thoroughly verifying previous copies - no writing!
	uninterruptibly sleeping / D-state'd processes:
		rsync - reading side			# sdb3, reading 400M file
		rsync - comparing side			# dm-sdc,reading 900M file
		kworker/2:16+dm-integrity-offload	# dm-sdc
		kworker/2:0+dm-integrity-offload	# dm-sdc
		jbd2/sda2-8					# sda2
		/bin/sh writing to file on sda2	# sda2
		/bin/sh creating a file on sda2	# sda2
	kernel messages:
		"failed to read SCR 1" for all PMP ports
		IO errors on sda
	remediation attempts:
		df -h						# no problem
		ls /mnt/dm-sdc				# !! entered D-state
		ls /home/user				# un-blocked some, but not all!?
	uninterruptibly sleeping / D-state'd processes:
		rsync - comparing side			# dm-sdc, reading 900M file
		kworker/2:16+dm-integrity-offload	# dm-sdc
		kworker/2:0+dm-integrity-offload	# dm-sdc
		kworker/3:0+dm-integrity-offload	# dm-sdc
		ls /mnt/dm-sdc				# dm-sdc
	remediation attempts:
		ls /mnt/sdb3/path1			# no problem
		ls /mnt/sdb3/path2			# no problem
		md5sum /mnt/sdb3/file1			# no problem
		md5sum /mnt/sdb3/file2			# no problem
		echo x >/proc/sysrq-trigger		# no problem
	kernel messages:
		one for each hung task
	remediation attempts:
		# none, system unblocked itself after a while!
	kernel messages:
		"failed to read SCR 1" for all PMP ports
		IO errors on sda

<system rebooted>

7th occurence:	5.9.0-4-armmp #1 Debian 5.9.11-1
	One rsync reading from sda2, writing to dm-sdc.
	uninterruptibly sleeping / D-state'd processes:
		jbd2/dm-0-8					# dm-sdc
		rsync - writing side			# dm-sdc, writing >100G file
		kworker/2:0+dm-integrity-writer	# dm-sdc
		kworker/0:0+dm-integrity-commit	# dm-sdc
		kworker/u8:2+flush-254:0		# dm-0 -> dm-sdc
		kworker/u8:1+dm-integrity-wait	# dm-sdc
		kworker/3:1+dm_bufio_cache		# dm-sdc
		kworker/0:3+dm_bufio_cache		# dm-sdc
	kernel messages:
		one for each hung task
	remediation attempts:
		df -h						# no problem
		ls /mnt/dm-sdc				# un-blocked the system
		note: sdc had to spin up because it was parked

8th occurence:	5.9.0-4-armmp #1 Debian 5.9.11-1
	Happened just minutes after "solving" #7
	uninterruptibly sleeping / D-state'd processes:
		jbd2/dm-0-8					# dm-sdc
		631/rsync - writing side		# dm-sdc
		kworker/1:3+dm-integrity-writer	# dm-sdc
		kworker/3:2+dm-integrity-commit	# dm-sdc
		kworker/u8:2+flush-254:0		# dm-0 -> dm-sdc
		kworker/u8:1+dm-integrity-wait	# dm-sdc
	kernel messages:
		NONE
	remediation attempts:
		df -h						# no problem
		ls /mnt/dm-sdc				# no problem
		ls /mnt/dm-sdc/dir1			# no problem
		ls /mnt/dm-sdc/dir1/d2/			# un-blocked the system
		note: not sure if sdc was spinning up or not

9th occurence:	5.9.0-4-armmp #1 Debian 5.9.11-1
Happened within tens of minutes after solving #8 (rsyncing the same file!)
	uninterruptibly sleeping / D-state'd processes:
		629/rsync - reading side		# sda2
		kworker/1:0+dm-integrity-commit	# dm-sdc
		kworker/1:1+dm-integrity-writer	# dm-sdc
		jbd2/dm-0-8					# dm-sdc
		kworker/0:0+dm_bufio_cache		# dm-sdc
		/bin/sh - cronjob1 writing a file	# ?
		/bin/sh - cronjob2 writing a file	# ?
	kernel messages:
		NONE
	remediation attempts:
		df -h						# no problem
		ls /mnt/dm-sdc				# no problem
		ls /mnt/dm-sdc/dir1			# no problem
		ls /mnt/dm-sdc/dir1/d2/			# no problem
		ls /mnt/dm-sdc/dir1/d2/d3/d4/		# un-blocked the system
		note: not sure if sdc was spinning up or not

10th, 11th, 12th occurence: 5.9.0-4-armmp #1 Debian 5.9.11-1
	Still rsyncing that the same >100G file
always remediated by 'find /mnt/sda & find /mnt/sdb & find /mnt/dm-sdc &'

13th occurence:	5.9.0-4-armmp #1 Debian 5.9.11-1
	Still rsyncing that one >100G file
	uninterruptibly sleeping / D-state'd processes:
		629/rsync - reading side		# sda2
		jbd2/dm-0-8					# dm-sdc
		kworker/2:15+dm-integrity-commit	# dm-sdc
		kworker/2:8+dm-integrity-writer	# dm-sdc
		jbd2/sda2-8					# sda2
		kworker/u8:1+flush-8:0			# ?
		/bin/sh - cronjob0 writing a file	# started from sda2
		/bin/sh - cronjob0 writing a file	# started from sda2
	The blocked shellscripts are odd. Their open FDs look like this:
		0 -> pipe:[7750528]
		1 -> /tmp/#713 (deleted)
		10 -> /mnt/sda2/dir/cronjob0.sh
		2 -> /tmp/#713 (deleted)
	kernel messages:
		one for each hung task
	remediation attempts:
		find /mnt/sda2				# !! entered D-state
		find /mnt/sdb3				# !! entered D-state
		find /mnt/dm-sdc				# un-blocked the system
		note: sdc had to spin up because it was parked

13th occurence:	5.9.0-4-armmp #1 Debian 5.9.11-1
	Still rsyncing that one >100G file
	uninterruptibly sleeping / D-state'd processes:
		629/rsync - reading side		# sda2
		jbd2/dm-0-8					# dm-sdc
		kworker/2:0+dm-integrity-commit	# dm-sdc
		kworker/3:3+dm-integrity-writer	# dm-sdc
	kernel messages:
		one for each hung task
	remediation attempts:
		find /mnt/sda2				# !! entered D-state
		find /mnt/sdb3				# !! entered D-state
		find /mnt/dm-sdc				# un-blocked the system
		note: not sure if sdc was spinning up or not

Cheers,
n-b-f

--- End Message ---
--- Begin Message ---
Hi

This bug was filed for a very old kernel or the bug is old itself
without resolution.

If you can reproduce it with

- the current version in unstable/testing
- the latest kernel from backports

please reopen the bug, see https://www.debian.org/Bugs/server-control
for details.

Regards,
Salvatore

--- End Message ---

Reply to: