Bug#727708: init system other points, and conclusion

To: 727708@bugs.debian.org
Subject: Bug#727708: init system other points, and conclusion
From: Russ Allbery <rra@debian.org>
Date: Sun, 29 Dec 2013 16:10:10 -0800
Message-id: <[🔎] 87r48vnqql.fsf@windlord.stanford.edu>
Reply-to: Russ Allbery <rra@debian.org>, 727708@bugs.debian.org
In-reply-to: <[🔎] 21183.21942.130759.900867@chiark.greenend.org.uk> (Ian Jackson's message of "Sat, 28 Dec 2013 22:50:30 +0000")
References: <[🔎] 21183.21942.130759.900867@chiark.greenend.org.uk>
We seem to be at the point of the process where at least those of us who
did early investigation are stating conclusions.  I think I have enough
information to state mine, so will attempt to do so here.

This is probably going to be rather long, as there were quite a few
factors that concerned me and that I wanted to investigate.

The brief summary is that I believe Debian should adopt systemd as its
default init system on Linux.  There are two separate conceptual areas in
which I think systemd offers substantial advantages over upstart, each of
which I would consider sufficient to choose systemd on its own.  Together,
they make a compelling case for systemd.  This position would have
substantial implications for upgrade paths and for non-Linux ports; I'll
discuss a bit of that below, but most of it in the separate branch of this
bug report that Ian opened on that topic.

Below, I first discuss the other choices before us besides systemd and
upstart.  Then I look at a straight technical comparison between the two
init systems, and finally look at issues of maintenance, community,
ecosystem, and portability.  The three main criteria on which I was
evaluating both systems were technical capabilities, surrounding
ecosystem, and portability.  The latter two turned out to be deeply
entangled, so I discuss them together.


1. Other Choices

First, other choices besides systemd and upstart.

There were three replacement init systems proposed to the Technical
Committee to replace sysvinit, plus the existing status quo.  The third
option, OpenRC, is a more conservative and less revolutionary change than
either systemd or upstart.  It continues to use the existing sysvinit init
process but replaces the startup script management with a more robust
shell library and additional features.

I think the OpenRC developers are great people and I wish them all the
success in the world with their project, but I just don't think it's
ambitious enough for Debian's needs.  If we're going to the effort of
replacing init systems and changing our startup scripts, a bare minimum
requirement for me is that we at least address the known weaknesses of the
sysvinit mechanism, namely:

* Lack of integration with kernel-level events to properly order startup.
* No mechanism for process monitoring and restarting beyond inittab.
* Heavy reliance on shell scripting rather than declarative syntax.
* A fork and exit with PID file model for daemon startup.

My impression of OpenRC is that it is not attempting to solve these issues
in the same way that systemd and upstart are.  To the extent that these
issues are on the OpenRC roadmap, it's not as far along as either systemd
or upstart is.  It's difficult to evaluate since the OpenRC documentation
is rather sparse and lacks the comprehensive manual available to both
systemd and upstart, which is itself a sign of a lack of project maturity.

I don't think that switching to OpenRC offers enough clear benefit over
the status quo.

That raises the other obvious option: sticking with sysvinit.  I've made
my position on this fairly clear in other threads, so I won't reiterate it
here at length.  The short version is that I turned to other tools to
manage daemons years ago because sysvinit was simply inadequate, and my
feeling on that hasn't changed.  The model of fork and exit without clear
synchronization points is inherently racy, the boot model encoded into
sysvinit doesn't reflect a modern system boot, and maintaining large and
complex init scripts as conffiles has been painful for years.  Nearly
every init script, including the ones in my own packages, have various
edge-case bugs or problems because it's very hard to write robust service
startup in shell, even with the excellent helper programs and shell
libraries that Debian has available.  A quick perusal of
/etc/init.d/skeleton and the complex case statements and careful attention
to status codes required for a proper init script makes this case clear.

I think the choice of a default init system for Linux is a choice between
systemd and upstart.  We would be doing ourselves and our users a
disservice to stick with the status quo, or even a moderate update of the
status quo to add a simpler service definition.  The limitations have been
well-known for years, and I think it's telling that most other operating
systems, even fairly conservative ones, have moved away from the System V
init script model.

The last option that was before us was supporting multiple init systems.
I consider this a variation on a transition plan, with a possibly infinite
time horizon, and will discuss this separately when I talk about
transition plans.


2. Core Service Management Functionality

As reported to this bug, I did a fairly extensive evaluation of both
upstart and systemd by converting one of my packages, which provides a
network UDP service, to a native configuration with both systems.  While
doing so, I tried to approach each init system on its own terms and
investigate what full, native support of that init system would look like,
both from a Debian packaging perspective and from an upstream perspective.
I also tried to work through the upgrade path from an existing init script
with an external /etc/default configuration file and how that would be
handled with both systemd and upstart.

I started this process with the expectation that systemd and upstart would
be roughly evenly matched in capabilities.  My expectation was that I
would uncover some minor differences and variations, and some different
philosophical approaches, but no deeply compelling differentiation.

To my surprise, that's not what happened.  Rather, I concluded that
systemd has a substantial technical advantage over upstart, primarily in
terms of available useful features, but also in terms of fundamental
design.

2.1. General Impressions

systemd feels like a software package that has been used and pounded on in
a wide variety of real-world situations, and has grown the flexibility and
adaptibility that is required to make a wide variety of use cases work.
upstart, on the other hand, has a minimal design and a ready escape to
shell scripting, which may have discouraged directly tackling a broader
array of use cases.  Regardless, there are a bunch of cases that systemd
handles cleanly with simple configuration that would require shell script
fragments or other workarounds in Ubuntu, which in turn makes the startup
configurations less reliable and harder to debug.

I was quite impressed throughout the process of developing systemd unit
files.  Every time I realized I needed some piece of functionality to
configure the daemon properly, systemd already had it.

2.2. Major Functionality Gaps

Here are the major pieces of functionality that I think would have to be
added to upstart for rough feature parity:

* Socket activation, by which I don't mean lazy start of daemons, although
  it enables that, but init management of socket setup so that daemons can
  start in parallel.

  This has been discussed elsewhere on the thread, but I want to note here
  that systemd's approach is bold and innovative.  We've had multiple
  discussions in Debian lists in the past where people have felt somewhat
  depressed or discouraged about Debian's lack of innovation or
  unwillingness to tackle sweeping improvements.  After having studied and
  implemented socket activation, I think this is one of those
  opportunities, and we should not pass it by.

  There are a variety of advantages to socket activation that have been
  discussed elsewhere, and I'm not going to repeat them all here.  But one
  I want to call out is the advantage for an enterprise systems
  administration environment.  Right now, in order to configure bind
  addresses or IPv6 behavior for my services, I have to dig into the
  individual configuration syntax or command-line flags of each separate
  daemon, and often there's no easy way to set these parameters without
  making intrusive changes to daemon startup.  Socket activation lets me
  manage all of this through a simple configuration override that I drop
  into /etc via (for example) Puppet, and the syntax is the same for every
  service that uses it.  It would obviously take quite some time to get
  there, but that's a really nice vision of the future, and one that would
  make a real difference for Debian use cases I care about.

  upstart has a socket activation protocol, but it would need an
  almost-complete redesign in order to be used the way that systemd's can
  be used.  It doesn't support passing multiple sockets (required for
  complex daemons, some IPv6 scenarios, and binding to multiple but not
  all interfaces), it doesn't support IPv6 at all, it doesn't support UDP
  sockets, and its configuration syntax is inadequate to represent the
  parameters that would be useful in a real-world case.  It also doesn't
  separate the socket configuration from the daemon configuration, which
  makes it harder for a local systems administrator to control binding
  behavior without changing other properties of daemon initialization.

* Integrated daemon status.  This one caught me by surprise, since the
  systemd journal was functionality that I expected to dislike.  But I was
  surprised at how well-implemented it is, and systemctl status blew me
  away.  I think any systems administrator who has tried to debug a
  running service will be immediately struck by the differences between
  upstart:

  lbcd start/running, process 32294

  and systemd:

    lbcd.service - responder for load balancing
     Loaded: loaded (/lib/systemd/system/lbcd.service; enabled)
     Active: active (running) since Sun 2013-12-29 13:01:24 PST; 1h 11min ago
       Docs: man:lbcd(8)
             http://www.eyrie.org/~eagle/software/lbcd/
   Main PID: 25290 (lbcd)
     CGroup: name=systemd:/system/lbcd.service
             └─25290 /usr/sbin/lbcd -f -l

  Dec 29 13:01:24 wanderer systemd[1]: Starting responder for load balancing...
  Dec 29 13:01:24 wanderer systemd[1]: Started responder for load balancing.
  Dec 29 13:01:24 wanderer lbcd[25290]: ready to accept requests
  Dec 29 13:01:43 wanderer lbcd[25290]: request from ::1 (version 3)

  Both are clearly superior to sysvinit, which bails on the problem
  entirely and forces reimplementation in every init script, but the
  systemd approach takes this to another level.  And this is not an easy
  change for upstart.  While some more data could be added, like the
  command line taken from ps, the most useful addition in systemd is the
  log summary.  And that relies on the journal, which is a fundamental
  design decision of systemd.

  And yes, all of those log messages are also in the syslog files where
  one would expect to find them.  And systemd can also capture standard
  output and standard error from daemons and drop that in the journal and
  from there into syslog, which makes it much easier to uncover daemon
  startup problems that resulted in complaints to standard error instead
  of syslog.  This cannot even be easily replaced with something that
  might parse the syslog files, even given output forwarding to syslog
  (something upstart currently doesn't have), since the journal will
  continue to work properly even if all syslog messages are forwarded off
  the host, stored in some other format, or stored in some other file.
  systemd is agnostic to the underlying syslog implementation.

* Security defense in depth.  Both upstart and systemd support the basics
  (setting the user and group, process limits, and so forth).  However,
  systemd adds a multitude of additional defense in depth features,
  ranging from capability limits to private namespaces or the ability to
  deny a job access to the network.  This is just a simple matter of
  programming on the upstart side, but it still contributes to the general
  feature deficit; the capabilities in systemd exist today.  I'm sure I'm
  not the only systems administrator who is expecting security features
  and this sort of defense in depth to become increasingly important over
  the next few years.

  Here again, I think we have an opportunity for Debian to be more
  innovative and forward-looking in what we attempt to accomplish in the
  archive by adopting frameworks that let us incorporate the principles of
  least privilege and defense in depth into our standard daemon
  configurations.

There are also a plethora of minor features and tuning available in
systemd but not in upstart.  None of this is as significant as the points
mentioned above, and none of it is as difficult to implement, but it's not
currently implemented, and I think it speaks to systemd having been tested
against a broader array of use cases.

2.3. Event vs. Dependency Model

There is one UI design difference between systemd and upstart that's less
clear-cut, but which I think will surprise people.  systemd is built
around familiar dependencies between services, and starts services in
dependency order.  There are some twists, such as allowing a service to
create a reverse dependency (make another service depend on it), but it's
the basic design that's familiar to any packager, or to users of languages
like Puppet.  upstart, on the other hand, uses a message bus model:
services are started when particular events are received, and dependencies
are expressed by listing the events required to trigger startup (or some
other action).

Conceptually, both of these designs are equivalent.  They both construct a
DAG that's used to order service startup.  However, upstart complicates
matters by having two types of messages on its message bus: signals and
methods (technically, there are also hooks, but the distinction doesn't
matter for this point).  Signals behave like the typical asynchronous
message bus event, or like a dependency: they trigger services to start,
but the service issuing the signal does not care whether anyone listens or
not.  Methods do not; methods are effectively synchronous calls and the
service issuing a method event waits until the method event has been acted
on before continuing.

The UI problem with this approach is that it creates a pitfall with rather
noticable consequences.  If someone ever confuses a signal event and a
method event and starts a service on a method event instead, it is then
very easy to block startup of some fundamental system service because its
method event never completes due to deadlock.  This is made somewhat more
likely by the fact that method events are the default in initctl emit
commands, whereas signal events require a flag.

Again, this is not a fundamental issue with either system; either
representation is mathematically convertable into the other.  But it's
difficult to mess up dependencies in quite the same way.  One can create
cycles, but unless one is modifying the dependencies of core services,
it's hard to create a cycle that involves a core service.  upstart
provides a way to shoot oneself in the foot by blocking startup of a core
service by listening to the wrong type of event.  This model doesn't, so
far as I could find, offer any clear advantages over a dependency
structure in compensation.

2.4. Configuration File Model

There is one place where I came into this evaluation preferring the
upstart design over the systemd design, and came away with a continued
preference, but a more mild one: the configuration file model.  systemd
uses an /etc overrides /lib model, where all unit configurations are
installed in /lib and only local overrides and some configuration goes
into /etc.  upstart uses the (more familiar to Debian) model where the
daemon configuration is a conffile in /etc.

Both approaches have real advantages, but I think the upstart approach has
slightly more.  The systemd model means that one no longer has to add
various guards to daemon configurations to allow for the possibility that
the package has been uninstalled but not purged.  Those continue to be
necessary with upstart (and continue to be written in shell; systemd
actually has a nicer language for doing this, even though it's not
needed).  However, the upstart approach makes it easier to preserve and
merge local changes with upstream changes.  In the systemd model, the
local administrator has line-by-line granularity on overrides of systemd
unit configurations, which while solving much of the problem does not help
with the specific case of wanting to change the flags passed to the
daemon.  If the package later changes the flags in some orthogonal way,
it's easy for the system to miss that change.  This is something that,
under systemd, will probably require development of new tools to warn the
adminsitrator of what's happened.  upstart avoids this problem by having
the whole configuration be managed as a conffile.

I think the upstart approach is better, but I think the drawbacks of the
systemd approach could be mostly overcome with some fairly simple Debian
tools.

2.5. Summary

I think the technical comparison between upstart and systemd as both
projects exist today substantially favors systemd, at both the feature and
design level.  When picking between both products as they currently exist
on the basis of their current capabilities and future adaptibility, I have
no qualms about picking systemd.


3. Ecosystem and Portability

One of the primary concerns from the start of this conversation has been
around portability of any new init system.  One advantage of the extreme
simplicity of sysvinit is that it's extremely portable; this advantage
continues to be shared by OpenRC.  Both of the more-functional init
systems are Linux-specific.  However, upstream attitudes towards
portability differ.  This ties directly into the development models of
both systemd and upstart, the community momentum, and the larger
surrounding ecosystem.

3.1. Ecosystem Reality Check

One of the points that I think may have been obscured in the discussion,
but which is important to highlight, is that basically all parties have
agreed that Debian will adopt large portions of systemd.  systemd is an
umbrella project that includes multiple components, some more significant
than others.  Most of those components are clearly superior to anything we
have available now on Linux platforms and will be used in the distribution
going forward.

In other words, this debate is not actually about systemd vs. upstart in
the most obvious sense.  Rather, the question, assuming one has narrowed
the choices to those two contenders, is between adopting all the major
components of systemd including the init system, or adopting most of the
major components of systemd but replacing the init system with upstart.
Either way, we'll be running udev, logind, some systemd D-Bus services,
and most likely timedated and possibly hostnamed for desktop environments.

I think this changes the nature of the discussion in some key ways.  We're
not really talking about choosing between two competing ecosystems.
Rather, we're talking about whether or not to swap out a core component of
an existing integrated ecosystem with a component that we like better.

Now, I am generally on the side that says loose coupling of ecosystems is
an inherent good.  However, I don't agree that it's such an inherent good
that we should disassemble things just for the sake of having disassembled
things.  At feature parity, and absent any compelling reason to swap
components, I think we should take the path of least resistance and use
the integrations that other people have already developed.  Debian has
more than enough hard integration problems to solve without creating new
ones for ourselves unnecessarily.  But that's the key word: unnecessarily.
If we do have a reason for doing this, we should seriously consider it.

Therefore, I believe the burden of proof is on upstart to show that it is
a clearly superior init system along some axis, whether that be
functionality or portability or flexibility or maintainability, to warrant
going to the effort of disassembling a part of the systemd ecosystem and
swapping in our own component.

3.2. Portability

This is a difficult topic to clearly discuss, since it is, in essence, all
future speculation at this point.

I should state up front that, in making these sorts of decisions around
free software projects, I have a relatively high future discount rate.  In
other words, I give substantially less credit to something that does not
exist now but could exist in the future.  I don't discount it to zero, but
I do discount it relatively strongly.  Others may not.

I do this because free software projects and volunteer projects are
inherently unpredictable.  The free software world is stuffed to the gills
with roadmaps that never actually happened, through no fault of any of the
people involved.  It's easy to agree that something would be a good idea,
and another matter to actually drive it through to completion.

Right now, neither systemd nor upstart work on non-Linux platforms.
Therefore, right now, adopting either of them means that we either
jettison our non-Linux ports or adopt a transition plan that retains
support for sysvinit scripts.  Right now, there is minimal difference
between the two projects in terms of portability; they both make extensive
use of Linux-specific APIs and have hooks for Linux-specific actions.

However, there is a porting effort for upstart to kFreeBSD underway, and
the current upstart maintainers have indicated more interest in
portability than the systemd maintainers.  That's been a point of
significant friction over systemd (and was, in the past, also a point of
friction with the previous upstart upstream, although that's subsequently
changed).  So there is a real advantage for upstart here, but it's one
that has to be discounted because it's potential future work that could
happen, but which is certainly not guaranteed to happen.

Another point worth considering here is that the best way, from Debian's
perspective, of porting either project to kFreeBSD or the Hurd is to
implement the currently Linux-specific interfaces on those platforms in
some fashion.  (An inotify and epoll API that uses kqueue under the hood,
for example.)  To the extent that this is possible, it benefits both
upstart and systemd equally, as well as many other programs in the
archive that are written to currently Linux-specific APIs.  This is an
approach that's been common for years in different porting scenarios; I
use it myself to maintain compatibility with both MIT Kerberos and Heimdal
in the Kerberos-related packages I maintain.

Finally, note the ecosystem point above.  To maintain feature parity
across Debian's ports, there already appears to be widespread agreement
that components of systemd will have to be ported, particularly logind and
possibly some of the other services.  Now, that's not quite the same thing
as porting the init system: it's possible those components use fewer
Linux-specific interfaces (I've not checked), it's possible that
alternative implementations of the same functionality can be provided
(which IIRC is what happened with udev in some fashion), and not being
able to run major desktop environments is not the same thing as not being
able to boot.  But I do think it blunts some of the porting argument.  The
non-Linux ports are going to have to port, fork, or replace systemd
components anyway, regardless of the choice of init system, or drop out of
feature parity with the Linux ports.

So, in short, I consider portability to be a possible benefit of upstart,
but I'm inclined to discount that advantage for several reasons.  One,
it's not yet actually materialized and still may not, and two, systemd
porting looks like it's going to be on the table regardless.  I therefore
think that we should deal with this issue through how we structure a
transition plan, rather than taking it as a reason to choose upstart over
systemd.  More on that in another message.

3.3. Project Momentum

One of the reasons why I'm leery of the future portability argument for
upstart, and one of the reasons why I'm leery of upstart in general, is
that I'm quite worried upstart will prove to be a blind alley.

I think there are several reasons to be concerned here.  None of them is
persuasive in isolation, but taken together I think they raise significant
cause for concern:

* Red Hat adopted upstart but never did a wholescale conversion, and then
  abandoned upstart in favor of systemd.  Obviously, one should not put
  too much weight on this; Red Hat is a commercial company that has a
  wealth of reasons for its actions that do not apply to Debian.  But I
  think it's still worth noting that the only non-Ubuntu major adopter of
  upstart backed away from it.

* upstart is older than systemd but has significantly fewer features.
  Now, the danger of this sort of metric is that features can be added as
  "padding" without any real significance or advantage.  But having spent
  serious time with both systems, I don't believe that's the case here.
  systemd is not adding extraneous features; rather, it's adding
  significant, useful functionality and real-world adaptability, and
  upstart is trailing despite being an older project.

* systemd has a broader community.  SuSE and Red Hat are both converting,
  there is significant interest across the general Linux community, major
  upstreams of Debian such as GNOME and KDE are adopting systemd support
  (and in some cases even requiring it), and systemd is tackling
  significant problems, such as logind, that everyone agrees need to be
  solved.  By comparison, upstart is effectively used only by Ubuntu, and
  there isn't the same sort of enthusiasm or attempts to tackle broad
  problems happening at present in the upstart community so far as I can
  see.  This is reasonable if upstart is mature and mostly complete
  software, but that was not my personal experience.

* There appears to be some direct tension between GNOME upstream and
  upstart, not mostly due to upstart itself but because of corporate
  direction at Canonical.  Again, this can easily be overstated.  But I do
  think that Debian will want to continue to support GNOME going forward,
  and doing that with upstart will clearly require more work within the
  project than doing that with systemd.  This is another case where we
  shouldn't shy away from the work if it's necessary, but we also
  shouldn't adopt unnecessary work.

Over the past few months, I've also put out some feelers to other
colleagues, and the uniform reaction I got in response is that systemd is
a better technical solution than upstart.  I think this speaks to the
general momentum around systemd, and will directly affect our ease of
integration in the future.  I know that after my personal experience with
both projects, I'm excited to add systemd support to my projects as
upstream, and not particularly enthused about upstart from an upstream
perspective since it doesn't offer me any clear benefits.

3.4. Summary

I'm concerned that, if we adopt upstart, in two or three years we'll end
up wanting to do the same thing that Red Hat did, back out, and switch to
systemd.  That would be a huge amount of wasted effort.  Even worse would
be to end up in that situation and decide that the conversion is too much
work, and then just settle for an init system that is harder to integrate
and provides less functionality.

I remain unconvinced of the long-term growth curve of the upstart project.
I don't think it's going to be abandoned completely, at least unless
Ubuntu decides to switch (which seems unlikely at the moment) or Canonical
dissolves (which also seems unlikely).  I do think there's a significant
danger that it will stagnate and fall behind in terms of desired features,
particularly since this appears to already be happening.  I don't have
faith in the path that takes upstart from where it is now to something
with feature parity with systemd as it is now, let alone something that's
clearly better than systemd.  And I think Debian as a project should be
aiming for better, not merely sufficient.

The portability issues are significant.  However, I don't think they
provide a clear advantage to upstart.  It's possible that they will in the
future, at which point the ecosystem argument becomes much more difficult
and much narrower.  But the fact remains that we'll be using large
components of systemd across the distribution anyway, which means that
swapping out the init system doesn't add as much portability as one might
hope, and increases our integration burden.

I think we should make wise decisions about which areas we want to invest
project effort.  I dislike investing significant project effort in
catch-up efforts that, when complete, merely get us back to where we would
have been if we'd chosen a different solution.  I don't think that's wise
stewardship of project resources.  I want to see Debian focus its efforts
on places where we can make a real difference, where we can be leaders.
That means adopting the best-of-breed existing solutions and building on
top of them, not reinventing wheels and thereby starting from a trailing
position.


4. Conclusion

If I'm correct in my analysis of the community and ecosystem dynamics, I
think upstart needs to show that it is a significantly better technical
choice than systemd in order to warrant the additional project work that
will be required to build on top of upstart.  Given feature parity, I
believe we should adopt systemd so that we can focus our efforts on
interesting new problems rather than on redoing integrations that other
people have already done.

My personal analysis did not show that upstart was significantly better
than systemd, or even at feature parity.  Rather, I believe it is
currently trailing systemd substantially in multiple areas, some of which
will require significant design changes.

Given that, I believe systemd is the clear choice, despite the portability
issues that we will incur by choosing it.  However, I think that means we
need to be very careful about how we handle a transition.  I intend to
comment on that in a separate message (which will probably be tomorrow
given how long writing this message took).

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>
Reply to:
Follow-Ups:
- Bug#727708: init system other points, and conclusion
  - From: Tollef Fog Heen <tfheen@err.no>
- Bug#727708: init system other points, and conclusion
  - From: Russ Allbery <rra@debian.org>
- Bug#727708: init system other points, and conclusion
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>
- Bug#727708: init system other points, and conclusion
  - From: Russ Allbery <rra@debian.org>
References:
- Bug#727708: init system other points, and conclusion
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>
Prev by Date: Re: Bug#733452: init system daemon readiness protocol
Next by Date: Bug#727708: socket activation
Previous by thread: Bug#727708: init system other points, and conclusion [and 1 more messages]
Next by thread: Bug#727708: init system other points, and conclusion
Index(es):
- Date
- Thread