Bug#1120598: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
- To: Chuck Lever <chuck.lever@oracle.com>
- Cc: "Tyler W. Ross" <TWR@tylerwross.com>, "1120598@bugs.debian.org" <1120598@bugs.debian.org>, Jeff Layton <jlayton@kernel.org>, NeilBrown <neil@brown.name>, Scott Mayhew <smayhew@redhat.com>, Steve Dickson <steved@redhat.com>, Olga Kornievskaia <okorniev@redhat.com>, Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>, Trond Myklebust <trondmy@kernel.org>, Anna Schumaker <anna@kernel.org>, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
- Subject: Bug#1120598: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
- From: Salvatore Bonaccorso <carnil@debian.org>
- Date: Thu, 13 Nov 2025 23:20:16 +0100
- Message-id: <[🔎] aRZZoNB5rsC8QUi4@eldamar.lan>
- Reply-to: Salvatore Bonaccorso <carnil@debian.org>, 1120598@bugs.debian.org
- In-reply-to: <[🔎] 1cee1c3e-e6b9-485a-a4d4-c336072f14c3@oracle.com>
- References: <[🔎] 176298368872.955.14091113173156448257.reportbug@nfsclient-sid.ipa.twrlab.net> <[🔎] aRVl8yGqTkyaWxPM@eldamar.lan> <[🔎] 8d873978-2df6-4b79-891d-f0fd78485551@oracle.com> <[🔎] c8-cRKuS2KXjk19lBwOGLCt21IbVv7HsS-V-ihDmhQ1Uae_LHNm83T0dOKvbYqsf4AeP5T8PR_xdiKLj5-Nvec-QVTLqIC4NGuU2FA0hN5U=@tylerwross.com> <[🔎] c7136bad-5a00-4224-a25c-0cf7e8252f4a@oracle.com> <[🔎] aRZL8kbmfbssOwKF@eldamar.lan> <[🔎] 1cee1c3e-e6b9-485a-a4d4-c336072f14c3@oracle.com> <[🔎] 176298368872.955.14091113173156448257.reportbug@nfsclient-sid.ipa.twrlab.net>
Hi Chuck,
On Thu, Nov 13, 2025 at 04:23:52PM -0500, Chuck Lever wrote:
> On 11/13/25 4:21 PM, Salvatore Bonaccorso wrote:
> > Hi Chuck,
> >
> > On Thu, Nov 13, 2025 at 12:47:23PM -0500, Chuck Lever wrote:
> >> On 11/13/25 12:16 PM, Tyler W. Ross wrote:
> >>> Thanks, Chunk.
> >>>
> >>> Suggested trace-cmd report from the client follows. Last 3 lines appear salient, but I've included the full report just in case.
> >>>
> >>> <idle>-0 [001] ..s2. 270.327040: xs_data_ready: peer=[10.108.2.102]:2049
> >>> kworker/u16:0-12 [001] ...1. 270.327048: xprt_lookup_rqst: peer=[10.108.2.102]:2049 xid=0x7b569c7a status=0
> >>> kworker/u16:0-12 [001] ...2. 270.327050: rpc_task_wakeup: task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=0x6 status=0 timeout=15000 queue=xprt_pending
> >>> kworker/u16:0-12 [001] ..... 270.327054: xs_stream_read_request: peer=[10.108.2.102]:2049 xid=0x7b569c7a copied=988 reclen=988 offset=988
> >>> kworker/u16:0-12 [001] ..... 270.327055: xs_stream_read_data: peer=[10.108.2.102]:2049 err=-11 total=992
> >>> ls-969 [003] ..... 270.327062: rpc_task_sync_wake: task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
> >>> ls-969 [003] ..... 270.327062: rpc_task_run_action: task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=xprt_timer
> >>> ls-969 [003] ..... 270.327063: rpc_task_run_action: task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
> >>> ls-969 [003] ..... 270.327063: rpc_task_run_action: task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_decode
> >>> ls-969 [003] ..... 270.327063: rpc_xdr_recvfrom: task:00000008@00000005 head=[0xffff8895c29fef64,140] page=4008(88) tail=[0xffff8895c29feff0,36] len=988
> >>> ls-969 [003] ..... 270.327067: rpc_xdr_overflow: task:00000008@00000005 nfsv4 READDIR requested=8 p=0xffff8895c29fefec end=0xffff8895c29feff0 xdr=[0xffff8895c29fef64,140]/4008/[0xffff8895c29feff0,36]/988
> >>
> >> Here's the problem. This is a sign of an XDR decoding issue. If you
> >> capture the traffic with Wireshark, does Wireshark indicate where the
> >> XDR is malformed?
> >>
> >> If it doesn't, then there is some problem with the client code. Since
> >> Fedora 43 is working as expected, I would guess there's a misapplied
> >> patch on Debian 13's kernel...?
> >
> > if it is helpful: Debian follows the stable upstream releases (6.12.y
> > for trixie/Debian 13, right now 6.17.y for Debian unstable) and we try
> > to keep the patches limited which we apply on top. So far I see none
> > which touches net/sunrpc/. The patches applied:
> > https://salsa.debian.org/kernel-team/linux/-/tree/debian/6.17/forky/debian/patches?ref_type=heads
> > (in case this could help narrowing down more the issue).
> >
> > But we could try here additionally, if Tylor has the possibility to do
> > so, to try directly the 6.17.7 upstream version without Debian patches
> > applied.
>
> A bisect between broken v6.12.y and working v6.17.7 could identify
> what is possibly missing from v6.12.y.
There seems to be a missundestanding? 6.17.7 as present in Debian
unstable is neither working, at least Tyler said:
> 2. Freshly installed Debian sid via mini ISO (2025-11-01). Same
> configuration as 1/above.
which includes a 6.17.y based kernel (6.17.7-1).
Regards,
Salvatore
Reply to: