[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: migration from cron.daily to systemd timers



On 1/8/20 1:02 PM, Michael Stone wrote:
> As a third party with no particular ax to grind on this, I do wonder
> what the advantage is to adding another mechanism for this particular
> use case, given the need to somehow handle upgrades involving an
> existing (presumably working?) solution.

At my work, we recently converted all our cron jobs to system units
(service and timer pairs). We see the following advantages:

You can use systemd's dependencies. This allows you to do things like
automatically stop a related scheduled task if its service is manually
stopped (e.g. for maintenance). This can keep the scheduled task from
interfering with your work (e.g. a watchdog script restarting the "dead"
service) and/or throwing errors (because the service isn't running). In
the latter case, that might mean that you avoid needing to suppress
errors because of this normal occurrence, which means you aren't
suppressing real errors.

The service unit can be started manually. This makes it a lot easier to
debug/develop with systemd than cron (where you have to manually change
it to run at the next minute and wait). Obviously, you can still run the
scripts directly, but sometimes the problem you're working on only
manifests when it is run under cron/systemd (e.g. because of different
environment variables being missing or different [1]).

The timer units can be randomized over arbitrarily sized windows,
customized per service. This avoids load peaks, e.g. at cron.daily time.

Your scripts can write non-error debug/status information to stdout
without it resulting in an email. This info shows up in the logs, which
can be convenient. [4] Obviously, one can use logger(1) or syslog,
depending on programming language, but for trivial in-house scripts,
it's hard to beat the simplicity of just printing to stdout.

Yesterday, we implemented a new script that uses a web API where you
request an action and then poll until it completes. We wanted a timeout.
With systemd, since the service is Type=oneshot, we can just set
TimeoutStartSec= and we're done. This will kill the script if it hangs
for any reason at any step. We obviously could have implemented a
timeout in the script for that particular step, or in general, but this
was simpler.

For new/junior sysadmins, there is more overlap. Everything they learn
about systemd services applies to both scheduled tasks and those that
start at boot. That's not to say that cron is harder to learn, just that
it's another thing.

Recent versions of systemd have an analyze option to show you the next
iteration(s) of a calendar specification. This makes it easy to verify
that your calendar syntax is firing at the right times, and you haven't
mixed up hours/minutes/seconds, etc. It will also show you the
normalized form (which we prefer to use to avoid confusing humans). Some
examples from my history:
systemd-analyze calendar --iterations=5 'Mon..Fri 01:50'
systemd-analyze calendar --iterations=5 '*:0/5'
systemd-analyze calendar --iterations=5 '*:27,57'

We do lose the automatic cron emails, which some would see as a
downside, though there are ways to get them on a service-by-service
basis. [2] [3] In our particular case, anything that is _expected_ to
send an email was already doing so manually (i.e. calling mail(1)) for
other integration reasons. In our particular case, though, using systemd
units was preferable from an alerting perspective, as we already have an
Icinga check that runs `systemctl list-units --failed` and alerts if any
service (scheduled or daemon style) has failed for any reason.

The systemd units are "fluffier" (more lines and more characters of
overhead) than crontabs, but that's not a true increase in complexity.
All told, with this change, it feels like things are a bit simpler and
easier to work on while achieving a bit better results. It's a modest
"quality of life" improvement.

This is _not_ my blog, but I read it regularly and comment periodically:
[1] https://utcc.utoronto.ca/~cks/space/blog/linux/SortCronLocaleDanger
[2] https://utcc.utoronto.ca/~cks/space/blog/linux/SystemdTimersAndErrors
[3] https://utcc.utoronto.ca/~cks/space/blog/linux/SystemdTimersMailNotes

This one quotes my comment on article [2]:
[4]
https://utcc.utoronto.ca/~cks/space/blog/sysadmin/NotificationsVersusLogs

-- 
Richard

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: