Bug#984928: Acknowledgement (slurmctld: fails to start on reboot)
- To: David Bremner <email@example.com>, firstname.lastname@example.org
- Subject: Bug#984928: Acknowledgement (slurmctld: fails to start on reboot)
- From: Gennaro Oliva <email@example.com>
- Date: Thu, 27 Jan 2022 23:15:30 +0100
- Message-id: <[🔎] YfMZgicfRR1OtYcg@ischia>
- Reply-to: Gennaro Oliva <firstname.lastname@example.org>, email@example.com
- In-reply-to: <firstname.lastname@example.org>
- References: <email@example.com> <handler.984928.B.firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org>
sorry for getting back to you so late. Thanks to your valuable
contribution I managed to find a working solution.
On Fri, Aug 06, 2021 at 11:01:48AM -0300, David Bremner wrote:
> I think (one) underlying problem is that the systemd unit file for
> slurmctld is incorrect. The details are in , but it seems like
> network.target is not correct (I think it very rarely is a useful
> target). I added the following
> # /etc/systemd/system/slurmctld.service.d/override.conf
> After=network-online.target munge.service
Yes this change is now part of the service file.
> I've switched to systemd-networkd on the hosts in question, so I can't
> easily test how this works with ifupdown, but I notice ifupdown provides
> which (guessing based on the name) should provide similar functionality
> to those documented in  for NetworkManager and systemd-networkd.
> : https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
Unfortunately using ifupdown-wait-online didn't help if I use
ifupdown and allow-hotplug interfaces, but I did not tested it
thoroughly since I want a solution that works out of the box.
Therefore I decided to patch the slurm code that is failing in order to
retry getaddrinfo before giving up starting daemons.