Bug#1120598: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
- To: "Tyler W. Ross" <TWR@tylerwross.com>
- Cc: "1120598@bugs.debian.org" <1120598@bugs.debian.org>, Jeff Layton <jlayton@kernel.org>, NeilBrown <neil@brown.name>, Scott Mayhew <smayhew@redhat.com>, Steve Dickson <steved@redhat.com>, Salvatore Bonaccorso <carnil@debian.org>, Olga Kornievskaia <okorniev@redhat.com>, Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>, Trond Myklebust <trondmy@kernel.org>, Anna Schumaker <anna@kernel.org>, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
- Subject: Bug#1120598: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
- From: Chuck Lever <chuck.lever@oracle.com>
- Date: Thu, 13 Nov 2025 13:57:20 -0500
- Message-id: <[🔎] 943d3e48-9582-4521-9cc8-eb84eb72d788@oracle.com>
- Reply-to: Chuck Lever <chuck.lever@oracle.com>, 1120598@bugs.debian.org
- In-reply-to: <[🔎] eUtqaTOrHO8Sj-82m04dsCpmYX8bPkr5r9Nla1muHxSnxBYq57wxk7LLf_RuI377WMpUcczBXteWGvF5OfNfe5gwLmfTn_YblJucaF58POo=@tylerwross.com>
- References: <[🔎] 176298368872.955.14091113173156448257.reportbug@nfsclient-sid.ipa.twrlab.net> <[🔎] aRVl8yGqTkyaWxPM@eldamar.lan> <[🔎] 8d873978-2df6-4b79-891d-f0fd78485551@oracle.com> <[🔎] c8-cRKuS2KXjk19lBwOGLCt21IbVv7HsS-V-ihDmhQ1Uae_LHNm83T0dOKvbYqsf4AeP5T8PR_xdiKLj5-Nvec-QVTLqIC4NGuU2FA0hN5U=@tylerwross.com> <[🔎] c7136bad-5a00-4224-a25c-0cf7e8252f4a@oracle.com> <[🔎] N14GL1WKSGqrFl8nF0e6sa0QxOZrnrpoC7IZlZ20YqUyfsxpsoqu2W3a31H_GfQv7OEqaEWKwDXdgtAV-xv613w_slTAFZIoyWMutIE5pKk=@tylerwross.com> <[🔎] 4b77bf39-bc1a-47a1-9a16-14c44c31614f@oracle.com> <[🔎] eUtqaTOrHO8Sj-82m04dsCpmYX8bPkr5r9Nla1muHxSnxBYq57wxk7LLf_RuI377WMpUcczBXteWGvF5OfNfe5gwLmfTn_YblJucaF58POo=@tylerwross.com> <[🔎] 176298368872.955.14091113173156448257.reportbug@nfsclient-sid.ipa.twrlab.net>
On 11/13/25 1:51 PM, Tyler W. Ross wrote:
> On Thursday, November 13th, 2025 at 11:12 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>
>> Then I would start looking for differences between the Debian 13 and
>> Fedora 43 kernel code base under net/sunrpc/ .
>>
>> Alternatively, "git bisect first, ask questions later" ... :-)
>
> This is outside my day-to-day, so I don't have a workflow for this kind of
> testing/debugging, but I'll see what I can do.
>
> Thanks for the starting place.
>
>> So I didn't find an indication of whether this was sec=krb5, sec=krb5i,
>> or sec=krb5p. That might narrow down where the code changed.
>
> I confirmed the issue with all 3 krb5 sec modes, in both the 6.12 kernel
> that ships with Debian 13 and the 6.17 that currently ships with Debian
> Sid/unstable. Similarly, I confirmed NFSv4.2, 4.1 and 4.0 are impacted.
>
>> Also, the xdr_buf might have a page boundary positioned in the middle of
>> an XDR data item. Knowing which data item is being decoded where the
>> "overflow" occurs might be helpful (I think adding pr_info() call sites
>> or trace_printk() will be adequate to gain some better observability).
>
> No experience with kernel hacking, so I'm not confident I can locate
> meaningful places to insert those.
>
> I'll see where some snooping and a bisect gets me. Failing that, if
> anyone has recommendations on where to add those calls, I'd appreciate
> the guidance.
xdr_inline_decode(). Easiest approach (but somewhat noisy) would be to
add a WARN_ON just after each of the trace_rpc_xdr_overflow() call
sites. The stack trace on the failing decode will be dumped into the
system journal.
--
Chuck Lever
Reply to: