Bug#1091439: kernel BUG at fs/nfsd/nfs4recover.c:534 Oops: invalid opcode: 0000
- To: Chuck Lever <chuck.lever@oracle.com>
- Cc: Scott Mayhew <smayhew@redhat.com>, Jur van der Burg via Bugspray Bot <bugbot@kernel.org>, anna@kernel.org, trondmy@kernel.org, jlayton@kernel.org, linux-nfs@vger.kernel.org, cel@kernel.org, 1091439@bugs.debian.org, 1091439-submitter@bugs.debian.org, 1087900@bugs.debian.org, 1087900-submitter@bugs.debian.org
- Subject: Bug#1091439: kernel BUG at fs/nfsd/nfs4recover.c:534 Oops: invalid opcode: 0000
- From: Salvatore Bonaccorso <carnil@debian.org>
- Date: Sat, 28 Dec 2024 20:36:18 +0100
- Message-id: <[🔎] Z3BTMhIfOedhgqlk@eldamar.lan>
- Reply-to: Salvatore Bonaccorso <carnil@debian.org>, 1091439@bugs.debian.org
- In-reply-to: <ae592779-4eb5-410e-b9bc-49165fbb643d@oracle.com>
- References: <20241209-b219580c0-d09195e1d9e8@bugzilla.kernel.org> <20241209-b219580c2-2def6494caed@bugzilla.kernel.org> <[🔎] Z22DIiV98XBSfPVr@eldamar.lan> <[🔎] 7c76ca67-8552-4cfa-b579-75a33caa3ed2@oracle.com> <[🔎] Z22r2RBlGT8PUHHb@eldamar.lan> <[🔎] Z25LCAz9-qDVAop9@eldamar.lan> <[🔎] 9e988cfa-5a27-4139-b922-b5c416ae0c72@oracle.com> <[🔎] Z2-V_reIDIgJ1AH7@eldamar.lan> <ae592779-4eb5-410e-b9bc-49165fbb643d@oracle.com> <[🔎] CA+Gs20Z6F0FxOWZCZCsSo8Y+TfkobHaq+sOKjjyN7FTcRJnV7w@mail.gmail.com>
Hi Chuck,
On Sat, Dec 28, 2024 at 12:13:56PM -0500, Chuck Lever wrote:
> On 12/28/24 1:09 AM, Salvatore Bonaccorso wrote:
> > Hi,
> >
> > On Fri, Dec 27, 2024 at 04:31:44PM -0500, Chuck Lever wrote:
> > > On 12/27/24 1:36 AM, Salvatore Bonaccorso wrote:
> > > > Hi,
> > > >
> > > > On Thu, Dec 26, 2024 at 08:17:45PM +0100, Salvatore Bonaccorso wrote:
> > > > > Hi Chuck, hi all,
> > > > >
> > > > > On Thu, Dec 26, 2024 at 11:33:01AM -0500, Chuck Lever wrote:
> > > > > > On 12/26/24 11:24 AM, Salvatore Bonaccorso wrote:
> > > > > > > Hi Jur,
> > > > > > >
> > > > > > > On Mon, Dec 09, 2024 at 04:50:05PM +0000, Jur van der Burg via Bugspray Bot wrote:
> > > > > > > > Jur van der Burg writes via Kernel.org Bugzilla:
> > > > > > > >
> > > > > > > > I tried kernel 6.10.1 and that one is ok. In the mean time I
> > > > > > > > upgraded nfs-utils from 2.5.1 to 2.8.1 which seems to fix the issue.
> > > > > > > > Sorry for the noise, case closed.
> > > > > > > >
> > > > > > > > View: https://bugzilla.kernel.org/show_bug.cgi?id=219580#c2
> > > > > > > > You can reply to this message to join the discussion.
> > > > > > >
> > > > > > > Are you sure this is solved? I got hit by this today after trying to
> > > > > > > check the report from another Debian user:
> > > > > > >
> > > > > > > https://bugs.debian.org/1091439
> > > > > > > the earlier report was
> > > > > > > https://bugs.debian.org/1087900
> > > > > > >
> > > > > > > Surprisingly I managed to hit this, after:
> > > > > > >
> > > > > > > Doing a fresh Debian installation with Debian unstable, rebooting
> > > > > > > after installation. The running kernel is 6.12.6-1 (but now believe it
> > > > > > > might be hit in any sufficient earlier version):
> > > > > > >
> > > > > > > Notably, in kernel-log I see as well
> > > > > > >
> > > > > > > [ 50.295209] RPC: Registered tcp NFSv4.1 backchannel transport module.
> > > > > > > [ 52.158301] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
> > > > > > > [ 52.158333] NFSD: Using legacy client tracking operations.
> > > > > >
> > > > > > Hi Salvatore,
> > > > > >
> > > > > > If you no longer provision nfsdcltrack in user space, then you want to
> > > > > > set CONFIG_NFSD_LEGACY_CLIENT_TRACKING to 'N' in your kernel config.
> > > > >
> > > > > Right, while this might not be possible right now in the distribution,
> > > > > to confirm, setting CONFIG_NFSD_LEGACY_CLIENT_TRACKING would resolve
> > > > > the problem. In the distribution I think we would not yet be able to
> > > > > do a hard cut for planned next stable release.
> > > > >
> > > > > Remember, that in Debian we only with the current stable release got
> > > > > again somehow on "track" with nfs-utils code.
> > > > >
> > > > > > Otherwise, Scott Mayhew is the area expert (cc'd).
> > > > >
> > > > > Thanks!
> > > > >
> > > > > I will try to get more narrow down to the versions to see where the
> > > > > problem might be introduced, but if you already have a clue, and know
> > > > > what we might try (e.g. commit revert on top, or patch) I'm happy to
> > > > > test this as well (since now reliably able to trigger it).
> > > >
> > > > Okay so this was maybe obvious for you already but bisecting leads to
> > > > the first bad commit beeing:
> > > >
> > > > 74fd48739d04 ("nfsd: new Kconfig option for legacy client tracking")
> > > >
> > > > The Problem is not present in v6.7 and it is triggerable with
> > > > 74fd48739d04 ("nfsd: new Kconfig option for legacy client tracking")
> > > >
> > > > Most importantly as the switch to defaulting to y was only in later
> > > > versions, explicitly setting CONFIG_NFSD_LEGACY_CLIENT_TRACKING=y.
> > >
> > > Hi Salvatore -
> > >
> > > I see that Scott recently sent a fix for a similar crash to linux-nfs@ :
> > >
> > > https://lore.kernel.org/linux-nfs/032ff3ad487ce63656f95c6cdf3db8543fb0d061.camel@kernel.org/T/#t
> >
> > Oh right, this described escactly the problem.
> >
> > Do you think that can be made reaching 6.13 as well (and then
> > cherry-picked to the affected stable series 6.12.y) or do we have to
> > wait for landing in 6.14 first?
>
> In nfsd-next, this fix is tagged:
>
> Fixes: 74fd48739d04 ("nfsd: new Kconfig option for legacy client tracking")
>
> So it will be backported to all appropriate earlier kernels as soon as
> it goes into Linus's master via the v6.14 merge window (in a couple of
> weeks).
Yes right, I was more wondering if it is eliglible for already land in
v6.13 as it is a bufix. But the issue has been open for long already,
so I guess waiting until it lands in v6.14 and then only get applied
way down as needed has to be sufficient.
Thanks all for your work,
Regards,
Salvatore
Reply to: