Package: nfs-kernel-server
Version: 1:2.8.2-1+b1
Severity: grave
Justification: causes non-serious data loss
X-Debbugs-Cc: invernomuto@paranoici.org
Dear maintainers,
I encountered a big issue, while upgrading package 'nfs-kernel-server'
on the box where the NFS server runs (the clients run on the compute
nodes of an HPC cluster).
The upgrade:
[UPGRADE] nfs-kernel-server:amd64 1:2.8.2-1 -> 1:2.8.2-1+b1
got stuck at
[...]
Setting up nfs-kernel-server (1:2.8.2-1+b1) ...
It looks like it was stuck at the restart of the systemd service:
# systemctl status nfs-kernel-server.service
● nfs-server.service - NFS server and services
Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; prese>
Drop-In: /run/systemd/generator/nfs-server.service.d
└─order-with-mounts.conf
Active: activating (start-pre) since Tue 2025-01-21 12:40:52 CET; 10min ago
Job: 97667
Invocation: ced460d410fe4059b9e8781b35340d70
Docs: man:rpc.nfsd(8)
man:exportfs(8)
Cntrl PID: 249039 (exportfs)
Tasks: 3 (limit: 154102)
Memory: 680K (peak: 2.5M)
CPU: 10ms
CGroup: /system.slice/nfs-server.service
├─239857 /usr/sbin/nfsdctl threads 0
├─239918 /usr/sbin/exportfs -au
└─249039 /usr/sbin/exportfs -r
There was a 'nfsdctl' process in uninterruptible sleep (D):
$ ps -eldaf | grep nf[s]
4 D root 239857 1 0 80 0 - 847 - 12:07 ? 00:00:00 /usr/sbin/nfsdctl threads 0
5 S root 247511 1 0 80 0 - 1375 - 12:35 ? 00:00:00 /usr/sbin/nfsdcld
After about 30 min, since trying to kill PID 239857 obviously had no effect,
and I could not find any other strategy to restart nfs-kernel-server.service,
I had to reboot the box, thus causing many problems to all the NFS clients.
After reboot, I could issue:
# aptitude --purge-unused safe-upgrade
which finally completed the upgrade (fixing the nfs-kernel-server package,
which was left in a partially configured state).
I have never seen anything like this before, and I have upgraded
nfs-kernel-server and related packages on Debian machines for quite
a long time.
Anyway, this should *not* happen during a system upgrade with
aptitude or apt!
I don't know whether bug [#992661] is related or not.
[#992661]: <https://bugs.debian.org/992661>
By looking at /var/log/kern.log , I see that a kernel BUG was traced
at the time when the 'nfsdctl' process got stuck in D state.
See the attached kern.log snippet.
Please investigate and fix the issue as soon as possible.
I really hope we can prevent this from happening again!
Thanks for your time and dedication.
-- Package-specific info:
-- rpcinfo --
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100011 1 udp 64737 rquotad
100011 2 udp 64737 rquotad
100011 1 tcp 55614 rquotad
100011 2 tcp 55614 rquotad
100024 1 udp 41792 status
100024 1 tcp 50467 status
100005 1 udp 46127 mountd
100005 1 tcp 39579 mountd
100005 2 udp 49119 mountd
100005 2 tcp 40039 mountd
100005 3 udp 33530 mountd
100005 3 tcp 55283 mountd
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 3 tcp 2049 nfs_acl
100021 1 udp 38915 nlockmgr
100021 3 udp 38915 nlockmgr
100021 4 udp 38915 nlockmgr
100021 1 tcp 33105 nlockmgr
100021 3 tcp 33105 nlockmgr
100021 4 tcp 33105 nlockmgr
-- /etc/default/nfs-kernel-server --
RPCNFSDPRIORITY=0
NEED_SVCGSSD=""
-- /etc/nfs.conf --
[general]
pipefs-directory=/run/rpc_pipefs
[nfsrahead]
[exports]
[exportfs]
[gssd]
[lockd]
[exportd]
[mountd]
manage-gids=y
[nfsdcld]
[nfsdcltrack]
[nfsd]
rdma=y
rdma-port=20049
[statd]
[sm-notify]
[svcgssd]
-- /etc/nfs.conf.d/*.conf --
-- System Information:
Debian Release: trixie/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Kernel: Linux 6.12.9-amd64 (SMP w/16 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages nfs-kernel-server depends on:
ii keyutils 1.6.3-4
ii libblkid1 2.40.4-1
ii libc6 2.40-5
ii libcap2 1:2.66-5+b1
ii libevent-core-2.1-7t64 2.1.12-stable-10+b1
ii libnl-3-200 3.7.0-0.3+b1
ii libnl-genl-3-200 3.7.0-0.3+b1
ii libreadline8t64 8.2-6
ii libsqlite3-0 3.46.1-1
ii libtirpc3t64 1.3.4+ds-1.3+b1
ii libuuid1 2.40.4-1
ii libwrap0 7.6.q-35
ii libxml2 2.12.7+dfsg+really2.9.14-0.2+b1
ii netbase 6.4
ii nfs-common 1:2.8.2-1+b1
ii ucf 3.0048
Versions of packages nfs-kernel-server recommends:
ii python3 3.12.8-1
ii python3-yaml 6.0.2-1+b1
Versions of packages nfs-kernel-server suggests:
ii procps 2:4.0.4-6
-- no debconf information
Attachment:
kern_log_snippet.log.gz
Description: application/gzip