Bug#1120598: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
- To: "Tyler W. Ross" <TWR@tylerwross.com>
- Cc: "1120598@bugs.debian.org" <1120598@bugs.debian.org>, Jeff Layton <jlayton@kernel.org>, NeilBrown <neil@brown.name>, Scott Mayhew <smayhew@redhat.com>, Steve Dickson <steved@redhat.com>, Salvatore Bonaccorso <carnil@debian.org>, Olga Kornievskaia <okorniev@redhat.com>, Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>, Trond Myklebust <trondmy@kernel.org>, Anna Schumaker <anna@kernel.org>, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
- Subject: Bug#1120598: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
- From: Chuck Lever <chuck.lever@oracle.com>
- Date: Thu, 13 Nov 2025 13:12:30 -0500
- Message-id: <[🔎] 4b77bf39-bc1a-47a1-9a16-14c44c31614f@oracle.com>
- Reply-to: Chuck Lever <chuck.lever@oracle.com>, 1120598@bugs.debian.org
- In-reply-to: <[🔎] N14GL1WKSGqrFl8nF0e6sa0QxOZrnrpoC7IZlZ20YqUyfsxpsoqu2W3a31H_GfQv7OEqaEWKwDXdgtAV-xv613w_slTAFZIoyWMutIE5pKk=@tylerwross.com>
- References: <[🔎] 176298368872.955.14091113173156448257.reportbug@nfsclient-sid.ipa.twrlab.net> <[🔎] aRVl8yGqTkyaWxPM@eldamar.lan> <[🔎] 8d873978-2df6-4b79-891d-f0fd78485551@oracle.com> <[🔎] c8-cRKuS2KXjk19lBwOGLCt21IbVv7HsS-V-ihDmhQ1Uae_LHNm83T0dOKvbYqsf4AeP5T8PR_xdiKLj5-Nvec-QVTLqIC4NGuU2FA0hN5U=@tylerwross.com> <[🔎] c7136bad-5a00-4224-a25c-0cf7e8252f4a@oracle.com> <[🔎] N14GL1WKSGqrFl8nF0e6sa0QxOZrnrpoC7IZlZ20YqUyfsxpsoqu2W3a31H_GfQv7OEqaEWKwDXdgtAV-xv613w_slTAFZIoyWMutIE5pKk=@tylerwross.com> <[🔎] 176298368872.955.14091113173156448257.reportbug@nfsclient-sid.ipa.twrlab.net>
On 11/13/25 1:05 PM, Tyler W. Ross wrote:
> On Thursday, November 13th, 2025 at 10:47 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>
>>> ls-969 [003] ..... 270.327063: rpc_xdr_recvfrom: task:00000008@00000005 head=[0xffff8895c29fef64,140] page=4008(88) tail=[0xffff8895c29feff0,36] len=988
>>> ls-969 [003] ..... 270.327067: rpc_xdr_overflow: task:00000008@00000005 nfsv4 READDIR requested=8 p=0xffff8895c29fefec end=0xffff8895c29feff0 xdr=[0xffff8895c29fef64,140]/4008/[0xffff8895c29feff0,36]/988
>>
>>
>> Here's the problem. This is a sign of an XDR decoding issue. If you
>> capture the traffic with Wireshark, does Wireshark indicate where the
>> XDR is malformed?
>
> Wireshark appears to decode the READDIR reply without issue. Nothing is obviously marked as malformed, and values all appear sane when spot-checking fields in the decoded packet.
Then I would start looking for differences between the Debian 13 and
Fedora 43 kernel code base under net/sunrpc/ .
Alternatively, "git bisect first, ask questions later" ... :-)
So I didn't find an indication of whether this was sec=krb5, sec=krb5i,
or sec=krb5p. That might narrow down where the code changed.
Also, the xdr_buf might have a page boundary positioned in the middle of
an XDR data item. Knowing which data item is being decoded where the
"overflow" occurs might be helpful (I think adding pr_info() call sites
or trace_printk() will be adequate to gain some better observability).
--
Chuck Lever
Reply to: