[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#984928: Acknowledgement (slurmctld: fails to start on reboot)



David Bremner <bremner@debian.org> writes:

> As a workaround, I noticed that setting the main ethernet interface to
> "auto" instead of "allow-hotplug" seems to fix the problem. By way of
> confirmation, on a different (virtual) machine changing the "auto" to
> "allow-hotplog" on the main ethernet interface causes the same problem
> to manifest.
>
> This is still a bit mysterious, since the messages complain about
> 127.0.0.1 which is of course the loopback interace, already marked
> "auto", and presumably up pretty early.

I think (one) underlying problem is that the systemd unit file for
slurmctld is incorrect. The details are in [1], but it seems like
network.target is not correct (I think it very rarely is a useful
target).  I added the following

# /etc/systemd/system/slurmctld.service.d/override.conf
[Unit]
After=network-online.target munge.service
Wants=network-online.target

And it seems to help. I didn't check if the second mention of
munge.service is really needed.

I've switched to systemd-networkd on the hosts in question, so I can't
easily test how this works with ifupdown, but I notice ifupdown provides

/lib/systemd/system/ifupdown-wait-online.service

which (guessing based on the name) should provide similar functionality
to those documented in [1] for NetworkManager and systemd-networkd.

[1]: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/


Reply to: