[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Ensure service keeps running with systemd



Hello!

tl;dr: When a daemon exits "normally" (for example due to signal 15)
       although it should not exit (because I did not call "systemctl
       stop"), systemd does not consider it a failure.

I have a Debian system running jessie. I have the following running:

* spamd (system wide, via the official systemd unit file)

* fetchmail (as user, started via a custom unit file in
  ~/.config/systemd/user)

For reasons that are not entirely clear to me, these daemons sometimes
stop (crash?) without me noticing. This is obviously undesirable,
because if spamd crashes, all mails go through as ham; if fetchmail
crashes I receive no more mail.

A simple solution using systemd would work pretty well for me. I would
like systemd to try to restart the service a few times on failure and
notify me every time one of those services fails (so I know something
happened no matter if the restart worked or not).

My first try is the following:

/etc/systemd/system/spamassassin.service.d/override.conf:

[Service]
Restart=on-failure
RestartSec=5

[Unit]
OnFailure=fail-notify@%n   # <--- this thing works, please ask
                           # if you need more info here

(fetchmail uses a similar setup; both services are Type=forking)

I added the OnFailure today and used kill -9 on the spamd processes to
simulate a crash. It seems to work.

The point is that, in the past, spamd disappeared permanently even
though "Restart=on-failure" was set. In my testing using kill -15 on
the daemon did not trigger a failure. systemd simply considers this a
normal exit and happily ignores that a critical daemon is no longer
running. I guess something similar may have happened silently before. I
want this to go from a silent failure to a loud one!

How do I tell systemd that it is a failure if the daemon is not running,
except if I explicitly killed it via "systemctl stop" or similar? I
either can't seem to find the right google query or this is not the
right way to go about this.

I would be thankful for any hints.

Tobias


Reply to: