I realized that slurmd always shows as failed with a
"Jan 23 18:03:05
c1-compute-1.wehi.edu.au systemd[1]: Can't open PID file /var/run/slurm/slurmd.pid (yet?) after start: No such file or directory"
According to
https://bugs.schedmd.com/show_bug.cgi?id=8388#c1 ,
"This is happening because we create the PID file slightly after systemd tries to read it. Commands where systemd needs to know the PID (eg systemctl restart slurmd.service) it will re-read the file (which appears to be getting created properly). From a functional standpoint, this error shouldn't have any impact on systemd or slurm."
The solution, is to remove the PIDFile line on slurmd.service:, according to upstream
https://bugs.schedmd.com/show_bug.cgi?id=8388#c3 :
"The quickest workaround you could use is to just comment out "PIDFile=*" line in the unit file and do a daemon-reload. instead of reading the pid file we write out, it will "guess" the main pid (and in my tests does so correctly)."