[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

NFS-root live image shuts its network off before fully shut down



Hi,

I'm using the Live project to make an NFS-root rescue environment.

It's mostly working fine, but when I halt or shut down from within
the live environment it seems to tear down the network almost
instantly which then causes the rest of the shutdown process to
stall presumably because it no longer has access to its root
filesystem.

This is live-build on bullseye making a bullseye netboot image.

Example behaviour:

user@rescue:~$ sudo halt
         Stopping Session 1 of user user.
         Stopping Session 3 of user user.
[  OK  ] Removed slice system-modprobe.slice.
.
.
.
[FAILED] Failed unmounting /run/live/medium.
[  OK  ] Unmounted /usr/lib/live/mount/medium.
[  OK  ] Stopped Network Time Synchronization.
[  OK  ] Stopped target Network.
         Stopping ifup for eth0...
         Stopping Raise network interfaces...
[  OK  ] Reached target Shutdown.
[  OK  ] Reached target Final Step.
         Starting Halt...
[  300.996462] systemd-journald[316]: Failed to send WATCHDOG=1 notification message: Connection refused
[  370.996567] systemd-journald[316]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[  490.996272] systemd-journald[316]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[  492.592379] nfs: server 192.168.80.243 not responding, still trying

As you can see, we're 492 seconds in and it's just not able to
proceed. It will hang here forever.

Is there a good way to make it so that the network goes down as the
very last thing?

I've tried placing a systemd override file so that the
networking.service just never actually shuts off eth0:

$ cat config/includes.chroot/etc/systemd/system/networking.service.d/override.conf
[Service]
ExecStop=
ExecStop=/sbin/ifdown -a --read-environment --exclude=lo --exclude=eth0

This did not appear to make any difference. Yes, the Ethernet link
was actually called eth0 (no predictable names involved). Any other
ideas?

Here is the auto/config file:

#!/bin/sh

set -e

lb config noauto \
    --architectures                     amd64 \
    --distribution                      bullseye \
    --binary-images                     netboot \
    --archive-areas                     main \
    --apt-source-archives               false \
    --apt-indices                       false \
    --backports                         true \
    --memtest                           none \
    --net-tarball                       true \
    "${@}"

The resulting binary/live/filesystem.squashfs is then booted with
NFS root with a kernel command line like:

[    0.000000] Command line: root=/dev/nfs ip=192.168.82.225:192.168.80.243:192.168.80.1:255.255.248.0:rescue hostname=rescue boot=live nfsroot=192.168.80.243:/srv/rescue nfsopts=tcp persistent console=hvc0

Thanks,
Andy


Reply to: