[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1037223: Possible bug causing I/O hangs



Package: linux-image-amd64
Version: 5.10.178-3


Hi all,

I do not usually report kernel bugs so hopefully this is the right place!

We recently updated the kernel of our Debian 11 servers and since then we have encountered a bunch of servers (both VMs and bare metal) that suffer I/O hanging issues.
We can access the server through a console where I cannot copy text, but I have attached a screenshot showing the message we see in dmesg.

We initially thought this was related to the ext4 fast_commit feature flag we have enabled, and we do feel the issue occurs less often with fast_commit disabled, but it does not appear to be solved completely when we disable this feature.

With this error, we've been googling a bit and I ended up on this thread: https://www.spinics.net/lists/linux-ext4/msg86261.html through initially https://github.com/flatcar/Flatcar/issues/847
It mentions this fix: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/fs/ext4?h=linux-5.15.y&id=5bc0b2fda4b47c86278f7c6d30c211f425bf51cf
I believe this fix is currently not present in the 5.10 kernel available for Debian 11.

However, the linked fix also mentions:
> This bug has been around for many years, but it became *much* easier
to hit after commit 65f8b80053a1 ("ext4: fix race when reusing xattr
blocks").

Looking at the changelog: https://metadata.ftp-master.debian.org/changelogs//main/l/linux-signed-amd64/linux-signed-amd64_5.10.178+3_changelog
We do see the "ext4: fix race when reusing xattr blocks" change being added in 5.10.178-1.
This is why we believe we are now hitting this bug.

My question is whether this seems plausible, and if so, whether the fix I linked can also be released for Debian 11?

We could also upgrade to the bullseye-backports kernel, but given that this issue makes the system essentially unusable and we hit it every few days on one of our servers it may be more widespread and worth it to fix it in the regular bullseye kernel as well.

Thank you!
Best regards,
Niels Hendriks

Reply to: