Bug#984928: Acknowledgement (slurmctld: fails to start on reboot)
- To: firstname.lastname@example.org
- Subject: Bug#984928: Acknowledgement (slurmctld: fails to start on reboot)
- From: David Bremner <email@example.com>
- Date: Fri, 06 Aug 2021 11:01:48 -0300
- Message-id: <[🔎] firstname.lastname@example.org>
- Reply-to: David Bremner <email@example.com>, firstname.lastname@example.org
- In-reply-to: <email@example.com>
- References: <firstname.lastname@example.org> <handler.984928.B.email@example.com> <firstname.lastname@example.org> <email@example.com>
David Bremner <firstname.lastname@example.org> writes:
> As a workaround, I noticed that setting the main ethernet interface to
> "auto" instead of "allow-hotplug" seems to fix the problem. By way of
> confirmation, on a different (virtual) machine changing the "auto" to
> "allow-hotplog" on the main ethernet interface causes the same problem
> to manifest.
> This is still a bit mysterious, since the messages complain about
> 127.0.0.1 which is of course the loopback interace, already marked
> "auto", and presumably up pretty early.
I think (one) underlying problem is that the systemd unit file for
slurmctld is incorrect. The details are in , but it seems like
network.target is not correct (I think it very rarely is a useful
target). I added the following
And it seems to help. I didn't check if the second mention of
munge.service is really needed.
I've switched to systemd-networkd on the hosts in question, so I can't
easily test how this works with ifupdown, but I notice ifupdown provides
which (guessing based on the name) should provide similar functionality
to those documented in  for NetworkManager and systemd-networkd.