[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#733915: RFP: s6 -- a small suite of programs for UNIX, designed to allow process supervision



Package: wnpp
Severity: wishlist

* Package name    : s6
  Version         : 1.1.1
  Upstream Author : ska <ska@skarnet.org>
* URL             : http://skarnet.org/software/s6/
* License         : ISC
  Programming Lang: C
  Description     : s6 - a small suite of programs for UNIX, designed to allow process supervision

s6 is a small suite of programs for UNIX, designed to allow process
supervision (a.k.a service supervision), in the line of daemontools and
runit. 

 Why another supervision suite ?

Supervision suites are becoming quite common. Today, we already have:

    Good (?) old System V init, which can be made to supervise services
if you perform /etc/inittab voodoo. BSD init can also be used the same
way with the /etc/ttys file, but for some reason, nobody among BSD
developers is using /etc/ttys to this purpose, so I won't consider BSD
init here.
    daemontools, the pioneer
    daemontools-encore, Bruce Guenter's upgrade to daemontools
    runit, Gerrit Pape's suite, well-integrated with Debian
    perp, Wayne Marshall's take on supervision
    and even Upstart, Ubuntu's init system, which performs real
supervision. Fedora's systemd and MacOSX's launchd are very similar in
spirit to Upstart, so the same comments apply to them.

Why is s6 needed ? What does it do differently ? Here are the criteria I
used.


Supervision suites should not wake up unless notified.

    System V init fails the test: it wakes up every 5 seconds, for the
reason that /dev/initctl might have changed. m(
    daemontools fails the test: it wakes up every 5 seconds to check for
new services.
    daemontools-encore does the same.
    the current version of runit fails the test: it wakes up every 14
seconds. But this is a workaround for a bug in some Linux kernels; there
is no design flaw in runit that prevents it from passing the test.
    perp works.
    Upstart works.
    s6 works. By default, s6-svscan wakes up every 5 seconds, to emulate
svscan behaviour; but it can be told not to do so. (s6-svscan -t0)


Supervision suites should provide a program that can run as process 1.

    System V init and Upstart are process 1, so no problem here.
    daemontools was not designed to take over init, although it can be
made to work with enough hacking skills. Same thing with
daemontools-encore.
    runit provides an init functionality, but the mechanism is separate
from the supervision itself; the runit process, not the runsvdir
process, runs as process 1. This lengthens the supervision chain.
    perp was not designed to run as process 1. It probably could be made
to work too without too much trouble.
    s6-svscan was designed from the start to be run as process 1,
although it does not have to.


Supervision suites should be bug-free, lightweight and easy to
understand.

    daemontools, daemontools-encore, runit and perp all qualify. All of
this is excellent quality code, unsurprisingly.
    This is where System V init and Upstart fail, hard. SysVinit is too
big for what it (poorly) does. Upstart is clever, but it's waaaaaay too
complex. Come on people... using ptrace to watch your children fork()?
Linking process 1 against libdbus? This is insanity. Process 1 should be
absolutely stable, it should be guaranteed to never crash, so the whole
of its source code should be under control. At Upstart's level of
complexity, those goals are outright impossible to achieve, so the
Upstart approach is flawed by design.
    Of course, systemd and launchd suffer from the same problem. Guys,
I'm glad you eventually realized that supervision was a good thing, and
that it had to be rooted in process 1, but that does not mean that all
the supervision logic has to go into process 1. No, really.
    s6, which has been designed with embedded environments in mind,
tries harder than anyone to pass this. It tries so hard that s6-svscan
and s6-supervise, the two long-running programs that make the
supervision chain, do not even allocate heap memory, and their main
program source files are less than 500 lines long.


Supervision suites should provide a basis for high-level service
management.

    Neither System V init, daemontools, runit or perp provides any hooks
to wait for a service to go up or down. runit provides a waiting
mechanism, but it's based on polling, and the ./check script has to be
manually written for every service.
    daemontools-encore qualifies: the notify script can be used for
inter-service communication. But it's just a hook: all the real
notification work has to be done by the notify script itself, no
notification framework is provided.
    Upstart already is a service management tool. But, again, it fails
the test of simplicity: it does in process 1 what can and should be done
outside of process 1. Process supervision is not the same as service
management, and Upstart confuses the two. So do systemd and launchd.
    s6 comes with libftrig, an event notification library, and
command-line tools based on this library, thus providing a simple API
for future service management tools to build upon.


Artistic considerations

    s6-svscan and s6-supervise are entirely asynchronous. Even during
trouble (full process table, for instance), they'll remain reactive and
instantly respond to commands they may receive. s6-supervise has even
been implemented as a full deterministic finite automaton, to ensure it
always does the right thing under any circumstance. Other supervision
suites do not achieve that for now.
    daemontools' svscan maintains an open pipe between a daemon and its
logger, so even if the daemon, the logger, and both supervise processes
die, the pipe is still the same so no logs are lost, ever, unless svscan
itself dies.
    runit has only one supervisor, runsv, for both a daemon and its
logger. The pipe is maintained by runsv. If the runsv process dies, the
pipe disappears and logs are lost. So, runit does not offer as strong a
guarantee as daemontools.
    perp has only one process, perpd, acting both as a "daemon and
logger supervisor" (like runsv) and as a "service directory scanner"
(like runsvdir). It maintains the pipes between the daemons and their
respective loggers. If perpd dies, everything is lost. Since perpd
cannot be run as process 1, this is a possible SPOF for a perp
installation; however, perpd is well-written and has virtually no risk
of dying, especially compared to process 1 behemoths like Upstart,
systemd and launchd.
    Besides, the runsv model, which has to handle both a daemon and its
logger, is more complex than the supervise model (which only has to
handle a daemon). Consequently, the runsvdir model is simpler than the
svscan model, but there is only one svscan instance when there are
several runsvs and supervises. The perpd model is obviously the most
complex; while very understandable, perpd is unarguably harder to
maintain than the other two.
    So, to achieve maximum simplicity and code reuse, and minimal memory
footprint, s6's design is close to daemontools' one. And when s6-svscan
is run as process 1, pipes between daemons and loggers are never lost.


Conclusion

All in all, I believe that s6 offers the best overall implementation of
a supervision suite as it should be designed. At worst, it's just
another take on daemontools with a reliable base library and a few nifty
features. 

ska - http://skarnet.org/software/s6/why.html


Reply to: