Bug#1093734: nfs-kernel-server: fails to complete setup during upgrade (stuck while restarting nfs-kernel-server.service)
Control: tags -1 + unreproducible moreinfo
On Wed, Jan 22, 2025 at 12:29:12AM +0100, Francesco Poli (wintermute) wrote:
> Package: nfs-kernel-server
> Version: 1:2.8.2-1+b1
> Severity: grave
> Justification: causes non-serious data loss
> X-Debbugs-Cc: invernomuto@paranoici.org
>
>
> Dear maintainers,
> I encountered a big issue, while upgrading package 'nfs-kernel-server'
> on the box where the NFS server runs (the clients run on the compute
> nodes of an HPC cluster).
>
> The upgrade:
>
> [UPGRADE] nfs-kernel-server:amd64 1:2.8.2-1 -> 1:2.8.2-1+b1
>
> got stuck at
>
> [...]
> Setting up nfs-kernel-server (1:2.8.2-1+b1) ...
>
>
>
> It looks like it was stuck at the restart of the systemd service:
>
> # systemctl status nfs-kernel-server.service
> ● nfs-server.service - NFS server and services
> Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; prese>
> Drop-In: /run/systemd/generator/nfs-server.service.d
> └─order-with-mounts.conf
> Active: activating (start-pre) since Tue 2025-01-21 12:40:52 CET; 10min ago
> Job: 97667
> Invocation: ced460d410fe4059b9e8781b35340d70
> Docs: man:rpc.nfsd(8)
> man:exportfs(8)
> Cntrl PID: 249039 (exportfs)
> Tasks: 3 (limit: 154102)
> Memory: 680K (peak: 2.5M)
> CPU: 10ms
> CGroup: /system.slice/nfs-server.service
> ├─239857 /usr/sbin/nfsdctl threads 0
> ├─239918 /usr/sbin/exportfs -au
> └─249039 /usr/sbin/exportfs -r
>
> There was a 'nfsdctl' process in uninterruptible sleep (D):
>
> $ ps -eldaf | grep nf[s]
> 4 D root 239857 1 0 80 0 - 847 - 12:07 ? 00:00:00 /usr/sbin/nfsdctl threads 0
> 5 S root 247511 1 0 80 0 - 1375 - 12:35 ? 00:00:00 /usr/sbin/nfsdcld
>
> After about 30 min, since trying to kill PID 239857 obviously had no effect,
> and I could not find any other strategy to restart nfs-kernel-server.service,
> I had to reboot the box, thus causing many problems to all the NFS clients.
>
> After reboot, I could issue:
>
> # aptitude --purge-unused safe-upgrade
>
> which finally completed the upgrade (fixing the nfs-kernel-server package,
> which was left in a partially configured state).
>
>
> I have never seen anything like this before, and I have upgraded
> nfs-kernel-server and related packages on Debian machines for quite
> a long time.
> Anyway, this should *not* happen during a system upgrade with
> aptitude or apt!
>
> I don't know whether bug [#992661] is related or not.
>
> [#992661]: <https://bugs.debian.org/992661>
>
> By looking at /var/log/kern.log , I see that a kernel BUG was traced
> at the time when the 'nfsdctl' process got stuck in D state.
> See the attached kern.log snippet.
>
> Please investigate and fix the issue as soon as possible.
> I really hope we can prevent this from happening again!
>
> Thanks for your time and dedication.
So I'm not able to reproduce this on a current Debian unstable system
mimicking the upgrade. *But* it is possible we have some races
somehwere as recently discussed at our regular kernel team meeting.
We need first to find a way to trigger the issue in any case.
Regards,
Salvatore
Reply to: