Bug#1071562: nfsd blocks indefinitely in nfsd4_destroy_session

To: submit@bugs.debian.org
Subject: Bug#1071562: nfsd blocks indefinitely in nfsd4_destroy_session
From: Martin Svec <martin.svec@zoner.cz>
Date: Tue, 21 May 2024 11:34:11 +0200
Message-id: <[🔎] 6007e422-d0d5-48d9-b3f3-a221f23e9e33@zoner.cz>
Reply-to: Martin Svec <martin.svec@zoner.cz>, 1071562@bugs.debian.org

Package: nfs-kernel-server
Version: 1:2.6.2-4

Package: linux-image-6.1.0-21-amd64
Version: 6.1.90-1

During our tests of Proxmox VE with Debian NFS server as a shared storage we've noticed
that nfsd sometimes becomes unresponsive and it's necessary to reboot the server.

Probably the same error is reported here:
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568

NFS server:
* DELL PowerEdge R730xd, 2x 10C XEON E5-2640, Samsung SM863 SSDs, 8 GB RAM
* fresh installation of Debian Bookworm
* Linux 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux
* connected using 10GE link
* nfsd.conf configured with nthreads=16 (also tested with 8 and 4), other options left on defaults
* XFS mount exported with options: rw,sync,no_root_squash,no_subtree_check,no_wdelay

NFS client:
* DELL PowerEdge FC630, 2x 14C Xeon E5-2680 v4, 256 GB RAM
* fresh installation of Proxmox VE 8.2
* Proxmox Linux 6.8.4-3-pve kernel
* connected using 10GE link
* nfs client mount options: rw,noatime,nodiratime,vers=4.2,rsize=1048576,wsize=1048576,
  namlen=255,hard,proto=tcp,nconnect=8,max_connect=16,timeo=600,retrans=2,sec=sys,
  clientaddr=10.xx.xx.xx,local_lock=none,addr=10.xx.xx.xx

Dmesg on nfsd server side (repeats forever):

[ 3142.693181] INFO: task nfsd:1035 blocked for more than 120 seconds.
[ 3142.693217]       Not tainted 6.1.0-21-amd64 #1 Debian 6.1.90-1
[ 3142.693239] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3142.693264] task:nfsd            state:D stack:0     pid:1035  ppid:2      flags:0x00004000
[ 3142.693273] Call Trace:
[ 3142.693275]  <TASK>
[ 3142.693279]  __schedule+0x34d/0x9e0
[ 3142.693288]  schedule+0x5a/0xd0
[ 3142.693294]  schedule_timeout+0x118/0x150
[ 3142.693301]  wait_for_completion+0x86/0x160
[ 3142.693307]  __flush_workqueue+0x152/0x420
[ 3142.693317]  nfsd4_destroy_session+0x1b6/0x250 [nfsd]
[ 3142.693379]  nfsd4_proc_compound+0x355/0x660 [nfsd]
[ 3142.693433]  nfsd_dispatch+0x1a1/0x2b0 [nfsd]
[ 3142.693478]  svc_process_common+0x289/0x5e0 [sunrpc]
[ 3142.693551]  ? svc_recv+0x4e5/0x890 [sunrpc]
[ 3142.693631]  ? nfsd_svc+0x360/0x360 [nfsd]
[ 3142.693676]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[ 3142.693720]  svc_process+0xad/0x100 [sunrpc]
[ 3142.693790]  nfsd+0xd5/0x190 [nfsd]
[ 3142.693836]  kthread+0xda/0x100
[ 3142.693843]  ? kthread_complete_and_exit+0x20/0x20
[ 3142.693849]  ret_from_fork+0x22/0x30
[ 3142.693858]  </TASK>

Dump of nfsd threads:

/proc/1032/stack:
[<0>] svc_recv+0x7f3/0x890 [sunrpc]
[<0>] nfsd+0xc3/0x190 [nfsd]
[<0>] kthread+0xda/0x100
[<0>] ret_from_fork+0x22/0x30

/proc/1033/stack:
[<0>] svc_recv+0x7f3/0x890 [sunrpc]
[<0>] nfsd+0xc3/0x190 [nfsd]
[<0>] kthread+0xda/0x100
[<0>] ret_from_fork+0x22/0x30

/proc/1034/stack:
[<0>] svc_recv+0x7f3/0x890 [sunrpc]
[<0>] nfsd+0xc3/0x190 [nfsd]
[<0>] kthread+0xda/0x100
[<0>] ret_from_fork+0x22/0x30

/proc/1035/stack:
[<0>] __flush_workqueue+0x152/0x420
[<0>] nfsd4_destroy_session+0x1b6/0x250 [nfsd]
[<0>] nfsd4_proc_compound+0x355/0x660 [nfsd]
[<0>] nfsd_dispatch+0x1a1/0x2b0 [nfsd]
[<0>] svc_process_common+0x289/0x5e0 [sunrpc]
[<0>] svc_process+0xad/0x100 [sunrpc]
[<0>] nfsd+0xd5/0x190 [nfsd]
[<0>] kthread+0xda/0x100
[<0>] ret_from_fork+0x22/0x30

/proc/130/stack:
[<0>] rpc_shutdown_client+0xf2/0x150 [sunrpc]
[<0>] nfsd4_process_cb_update+0x4c/0x270 [nfsd]
[<0>] nfsd4_run_cb_work+0x9f/0x150 [nfsd]
[<0>] process_one_work+0x1c7/0x380
[<0>] worker_thread+0x4d/0x380
[<0>] kthread+0xda/0x100
[<0>] ret_from_fork+0x22/0x30

On NFS client side, there's a number of backchannel reply errors:

[78636.676789] RPC: Could not send backchannel reply error: -110
[78647.905675] RPC: Could not send backchannel reply error: -110
[78675.207201] RPC: Could not send backchannel reply error: -110
[78744.201603] RPC: Could not send backchannel reply error: -110
[78784.138769] RPC: Could not send backchannel reply error: -110

We're able to reproduce this bug quite often (several times a day) when
restoring a 500GB virtual machine image from Proxmox Backup Server to
NFS shared storage. On the other hand, we cannot trigger it by other
ways like random and/or sequential I/O fio stress tests. According to
iostat, the VM restore job writes to NFS server in 300-400 MiB batches
separated by 3-4 secs of inactivity.

Interestingly, this issue probably occurs only when using a recent kernel
on NFS client side. We're able to hit this bug only with Proxmox Linux
6.8.4-3-pve kernel on NFS client side. When using Proxmox 6.5.13-5-pve
kernel there're no client-side backchannel reply errors and nfsd server
runs without any hungs. It seems to me that changes in NFS client code
between 6.5.x and 6.8.x accidentally uncovered a race in nfsd server code.

Based on the bug report #2062568 in Ubuntu I assume this is not
a Proxmox-specific issue but Proxmox VM restore workload together
with our testing hardware setup makes it easier to hit.

Regards,
Martin

Reply to:

Prev by Date: Bug#1071420: linux-image-6.8.9-1-amd64: cannot mount btrfs root partition
Next by Date: iproute2_6.9.0-1~bpo12+1_source.changes ACCEPTED into stable-backports
Previous by thread: Processed: Re: Bug#1071559: linux-headers-6.8.9-amd64: error creating r8125 module with dkms
Next by thread: iproute2_6.9.0-1~bpo12+1_source.changes ACCEPTED into stable-backports
Index(es):
- Date
- Thread