[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#727708: init system thoughts



On Mon, Dec 30, 2013 at 09:52:04PM -0800, Russ Allbery wrote:
> Steve Langasek <vorlon@debian.org> writes:

> > Upstart (as implemented in Ubuntu) restores this guarantee (with
> > provisions for failsafe booting in the case of a wrong network config),
> > and it takes advantage of upstart's capability of sending arbitrary
> > signals to do so.  I can see how one could implement the equivalent of
> > upstart's static-network-up event on systemd, using generators.  But
> > what would the equivalent to /etc/init/failsafe.conf look like?  I think
> > this would be very difficult to express in systemd language, yet it's
> > altogether vital for providing a boot that is both reliably ordered, and
> > recoverable in the event of problems.

> I'm not sure I completely understand what failsafe.conf actually does,

The purpose of failsafe.conf is to ensure that services which have not been
converted to the native format, but instead provide initscripts that are
called upon reaching runlevel 2, are started at the right time - so that
they aren't unreliable due to racing the network stack.  This is an existing
bug on sysvinit systems, but the race is hit much less frequently there
because sysvinit is slower.

The failsafe.conf strikes a balance between never waiting for networks
(breaking services that assume the network is up) and always waiting for all
networks (breaking systems that have stale network configs in
/etc/network/interfaces), by ensuring services will start as soon as all the
static networks come up *if* they are present, and falling back to a
"reasonable" timed delay if they're not.

Without an equivalent to failsafe.conf, server systems converted from
sysvinit to systemd will find some of their (poorly-coded, but nevertheless
common and supported in Debian) services randomly failing to start on boot
where they started reliably before.

> So, in other words, assuming that I understand this correctly, the way
> that you implement this in systemd is that you add a Before= dependency to
> the network.target (hm, which implies that my assumption about things
> meddling with dependencies of core services being less likely is not as
> correct as I thought it was, although I still think it's less likely to be
> done by accident) that waits for the network to be configured, but
> implements a timeout to ensure that you don't stall forever.

From your answer and Tollef's, I'm satisfied that this requirement can be
represented in a reasonable fashion on top of systemd, probably with a
combination of an if-up.d hook (like for upstart) and a systemd unit that
wraps a script much like /etc/init/failsafe.conf to set a timeout.

I am left with the concern that I seem to be the first person to ask this
question, in discussions with the TC, six months after AIUI the systemd
maintainers considered systemd ready to be made the default.  The kinds of
race conditions that I'm highlighting here are ones that Ubuntu identified
and resolved over the course of two years of experience with Upstart in the
wild, at the cost of quite a bit of pain for Ubuntu's users in the meantime.
I fear that switching Debian to systemd by default would inflict the same
kind of pain on Debian's users, because the fixes available in Ubuntu will
not translate directly to systemd and no other distribution that has adopted
systemd has dealt with these issues yet.

Russ, the feature gaps that you rightly point out between systemd and
upstart, while they represent a significant amount of work, are all IMHO
solvable for jessie.  The integration work, in contrast, feels very
open-ended to me; I'm not worried that the work will be insurmountable, but
that we will fail to identify the issues in a timely fashion before the
jessie release.  I'm not saying this to deliberately engage in FUD by
pointing to an unquantifiable risk; I genuinely fear that this *is* a risk,
and the full extent of the risk is only becoming clear to me as a result of
these discussions.  Ifupdown integration was one of the very first things I
addressed after adopting the upstart package in Debian, and I would never
have proposed people run it on their systems without this in place.  So I
fear that switching to systemd by default is going to result in easier
package maintenance for early adopters, but a much buggier experience for
our users.  If we decide for systemd, what do you think is the right way to
mitigate such problems for jessie?  Some of these issues are only going to
be seen once people start making use of systemd in anger with their existing
server configs (e.g., an ec2 VM with a simple disk and network config is not
going to expose these problems), and I don't really think we want to do this
by way of switching the default in unstable and waiting for the bugs to roll
in.

Perhaps you're right that there is such a night and day difference between
systemd and upstart that it warrants us redoing the integration work on top
of systemd that has already been done on top of upstart in Ubuntu.  But in
that case I would still want to know that, while redoing that integration,
we aren't leaving our users in the lurch.

On Tue, Dec 31, 2013 at 09:36:54AM +0100, Tollef Fog Heen wrote:
> There is none, and it would not be ifupdown-specific.  We could easily
> enough add a «wait for a default ipv4 and ipv6 default route to appear»
> unit if somebody desired that, which would be pulled in by
> network-online.target.  It's a pretty trivial piece of code.

> Alternatively, just put systemctl start network-online.target into a
> fragment in if-up.d if you consider ifup considering a network interface
> up to be enough.  (I don't, but as pointed out on the systemd wiki page
> referenced, people have different ideas about what «network online»
> means.)

I believe the correct behavior, for compatibility with legacy sysvinit
scripts, is to call 'systemctl start network-online.target' (or possibly,
'systemctl start network.target') only after all statically configured
network interfaces have been brought up.  The 'all_interfaces_up' handler
in /etc/network/if-up.d/upstart should be directly translatable for
systemd's purposes.

-- 
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                    http://www.debian.org/
slangasek@ubuntu.com                                     vorlon@debian.org

Attachment: signature.asc
Description: Digital signature


Reply to: