[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Slurmd will always fail with PIDFile set on systemd



I realized that slurmd always shows as failed with a

"Jan 23 18:03:05 c1-compute-1.wehi.edu.au systemd[1]: Can't open PID file /var/run/slurm/slurmd.pid (yet?) after start: No such file or directory"

According to https://bugs.schedmd.com/show_bug.cgi?id=8388#c1 ,

"This is happening because we create the PID file slightly after systemd tries to read it.  Commands where systemd needs to know the PID (eg systemctl restart slurmd.service) it will re-read the file (which appears to be getting created properly).  From a functional standpoint, this error shouldn't have any impact on systemd or slurm."

The solution, is to remove the PIDFile line on slurmd.service:, according  to upstream https://bugs.schedmd.com/show_bug.cgi?id=8388#c3 :

"The quickest workaround you could use is to just comment out "PIDFile=*" line in the unit file and do a daemon-reload. instead of reading the pid file we write out, it will "guess" the main pid (and in my tests does so correctly)."

The patch is 

--- slurmd.service.orig 2022-01-27 17:18:14.000000000 +0100
+++ slurmd.service 2022-01-27 17:18:36.000000000 +0100
@@ -9,7 +9,6 @@
 EnvironmentFile=-/etc/default/slurmd
 ExecStart=/usr/sbin/slurmd -D $SLURMD_OPTIONS
 ExecReload=/bin/kill -HUP $MAINPID
-PIDFile=/run/slurmd.pid
 KillMode=process
 LimitNOFILE=131072
 LimitMEMLOCK=infinity


--
[]
Alexandre Strube


Reply to: