[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1071501: linux-image-6.1.0-21-arm64: Linux NFS client hangs in nfs4_lookup_revalidate



Dear Salvatore,

I've already started bisecting. It will take some time. Usually the bug appears after a few hours, unfortunately I am not able to trigger it faster. So, if the bug appears, I can step forward easily, but if not, its hard to decide if it is still present and simply just have not occured, or if the current version is a good one. I'll try to do my best.

I will also contact linux-nfs mailing list.

As I remember, it started nearly a year ago, when I switched to Debian's kernel. I dont know exactly what version was at that time. Howewer, I've checked Debian's patches, and I did not find anything related to NFS.

Regards,
Richard


2024-05-20 21:07 időpontban Salvatore Bonaccorso ezt írta:
Hi Richard,

On Mon, May 20, 2024 at 09:27:24AM +0000, Richard Kojedzinszky wrote:
Package: src:linux
Version: 6.1.90-1
Severity: normal
X-Debbugs-Cc: richard+debian+bugreport@kojedz.in

Dear Maintainer,

I am running kubernetes on debian, and pods are mounting multiple nfs
shares. I am running dovecot processes in PODs, which receive mails from
the internet, and also serves as imap server for clients. I am
monitoring my mail system by sending mails periodically (15 seconds) and also downloading them via imap. I found a few times that some dovecot process stuck in D state, a reboot was always needed to recover from that state.

Unfortunately, I was not able to trigger the bug really fast, I dont
really know what operations does dovecot issue and in what order to trigger this behavior. So until I get closer, I've set up a similar, but smaller
environment with just a single dovecot process, and it also does the
same work, delivering only test mails locally, and serving them via imap to the monitoring client, storing everything on NFS. Fortunately, this also triggers the bug, after a few hours one of the dovecot processes is stuck
in D state. Kernel also shows blocked state:

As you seem in the lucky position to be able to trigger the issue in a
more localized setup, might you:

- try as well more recent kernels from upper suites (6.8.9-1 in
  unstable would be ideal to check if the issue is there as well).
- I did read you cannot trigger with 5.15. If you build 6.1.90 from
  upstream without Debian patches I assume you can trigger the issue
  likewise? If so could you bisect the changes introducing the issue?
  This is a cumbersome process in particular if you need few hours to
  trigger it  So maybe the following point could be done first:
- Can you report the issue to the linux-nfs list, keeping us in the
  loop?

Regards,
Salvatore


Reply to: