Bug#1071501: Linux NFS client hangs in nfs4_lookup_revalidate

To: linux-nfs@vger.kernel.org, 1071501@bugs.debian.org
Subject: Bug#1071501: Linux NFS client hangs in nfs4_lookup_revalidate
From: Richard Kojedzinszky <richard+debian+bugreport@kojedz.in>
Date: Thu, 23 May 2024 16:35:18 +0200
Message-id: <[🔎] 162d12087ba8374a57e2263d7ea762b5@kojedz.in>
Reply-to: Richard Kojedzinszky <richard+debian+bugreport@kojedz.in>, 1071501@bugs.debian.org
In-reply-to: <[🔎] 73e081764d06746be27c5f0d2f181938@kojedz.in>
References: <[🔎] 0473c552b6fd8e96ef2ffbf0435a7552@kojedz.in> <[🔎] 73e081764d06746be27c5f0d2f181938@kojedz.in> <[🔎] 171619724421.12490.10588035153055943112.reportbug@reportbug-6bf8b7fbdc-jccqf>

Dear devs,

I am attaching a stripped down version of the little program whichtriggers the bug very quickly, in a few minutes in my test lab. Itturned out that a single NFS mountpoint is enough. Just start theprogram giving it the NFS mount as first argument. It will chdir there,and do file operations, which will trigger a lockup in a few minutes.


Please take a look at it.

Thanks in advance,
Richard

2024-05-23 14:12 időpontban Richard Kojedzinszky ezt írta:

Dear devs,
Now bisecting turned out that 3c59366c207e4c6c6569524af606baf017a55c61is the bad commit for me. Strangely it only affects my dovecot processaccessing data over NFS.
Can you please confirm that this may be a bad commit?
My earlier attached programs may be used to demonstrate/trigger theissue. It even could be stripped down to minimal operations to triggerthe bug.
Thanks in advance,
Richard


2024-05-23 09:10 időpontban Richard Kojedzinszky ezt írta:
Dear NFS developers,
I am running multiple PODs on a Kubernetes node, they all mountdifferent NFS shares from the same nfs server. I started to noticehangups in my dovecot process after I switched to Debian's kernel fromupstream 5.15. You can find Debian bugreport athttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071501.
So, effectively I am running dovecot in Kubernetes, and dovecot's datadirectory is accessed over NFS. Eventually one dovecot process stucksin nfs4_lookup_revalidate(). From that point, that process cannot bekilled, howewer, other processes can access NFS as normal. Also,another dovecot process running on the very same node accessing thesame NFS share works too.
Now, I am still in the process of bisecting, howewer, I cannotreliably trigger the bug. Originally it took a few days after I'venoticed a hanging process. Now I am trying to mimic file operationswhat dovecot does in a faster way. Now it seems that it triggers thebug in a few hours, howewer, during bisects, I can still makemistakes.
I've scheduled many of my applications which use NFS shares to thesame node, to have more NFS load on that node.
I am attaching my simple app which triggers the bug in a few hours, atleast in my lab. I have two dedicated NFS shares for this test case,and I am running 3 instances of the applications for both shares.Also, I am running other production applications on the same nodewhich also use NFS, howewer, I dont experience lockups with them. Theyare librenms, prometheus, and a docker private registry. This way Idont know if running the attached app only is enough to trigger thebug.
Once I have a suspectible commit based on my bisecting process, I willreport it here.
My NFS server is a TrueNAS, based on FreeBSD 13.3.

Thanks in advance,
Richard

Attachment: ds.tar
Description: Unix tar archive

Reply to:

References:
- Bug#1071501: Linux NFS client hangs in nfs4_lookup_revalidate
  - From: Richard Kojedzinszky <richard+debian+bugreport@kojedz.in>
- Bug#1071501: Linux NFS client hangs in nfs4_lookup_revalidate
  - From: Richard Kojedzinszky <richard+debian+bugreport@kojedz.in>
- Bug#1071501: linux-image-6.1.0-21-arm64: Linux NFS client hangs in nfs4_lookup_revalidate
  - From: Richard Kojedzinszky <richard+debian+bugreport@kojedz.in>

Prev by Date: Bug#1071501: Linux NFS client hangs in nfs4_lookup_revalidate
Next by Date: Bug#1063161:
Previous by thread: Bug#1071501: Linux NFS client hangs in nfs4_lookup_revalidate
Next by thread: Bug#1071501: Linux NFS client hangs in nfs4_lookup_revalidate
Index(es):
- Date
- Thread