Bug#904558: What should happen when maintscripts fail to restart a service

To: Simon McVittie <smcv@debian.org>
Cc: 904558@bugs.debian.org, Margarita Manterola <marga@debian.org>, Sean Whitton <spwhitton@spwhitton.name>, Ian Jackson <ijackson@chiark.greenend.org.uk>, Tollef Fog Heen <tfheen@err.no>, Wouter Verhelst <wouter@debian.org>, Anthony DeRobertis <anthony@derobert.net>, Gunnar Wolf <gwolf@debian.org>, Stuart Prescott <stuart@debian.org>
Subject: Bug#904558: What should happen when maintscripts fail to restart a service
From: Sam Hartman <hartmans@debian.org>
Date: Sun, 07 Oct 2018 11:23:43 -0400
Message-id: <[🔎] tsl1s91rda8.fsf@suchdamage.org>
Reply-to: Sam Hartman <hartmans@debian.org>, 904558@bugs.debian.org
In-reply-to: <[🔎] 20181007104909.GA15664@espresso.pseudorandom.co.uk> (Simon McVittie's message of "Sun, 7 Oct 2018 11:49:09 +0100")
References: <877elkdkkg.fsf@silentflame.com> <[🔎] 20181007104909.GA15664@espresso.pseudorandom.co.uk> <877elkdkkg.fsf@silentflame.com>

>>>>> "Simon" == Simon McVittie <smcv@debian.org> writes:

    Simon> the error path is most important were packages that provide a
    Simon> system-level API to other packages, so their failures are
    Simon> likely to cause other packages to fail to configure (such as
    Simon> local DNS caches and authentication services like LDAP); and
    Simon> packages that provide remote access, so their failures need
    Simon> to be fixed before a potentially remote sysadmin logs out to
    Simon> prevent the sysadmin from being locked out longer-term (like
    Simon> sshd).

As a maintainer of one of the more important packages (krb5-kdc and
krb5-admin-server), ;I'd like to chime in here.  krb5-kdc provides
enterprise level authentication and if it fails may well take out
authentication for an entire environment.

Even so,  I've found that causing upgrades to fail does far more harm
than good even for this package.

Here is my experience based on my own observations and based on bug
reports and helping people diagnose problems in krb5:

* The vast majority of failures are when krb5-kdc gets installed on a
  system where it is not actually needed, or where it was partially
  configured for  a test.  In these cases, breaking an kupgrade does
  much more harm than good.  It may break other services, because those
  services may end up in a half-configured state, so a service that is
  not critical for a given system may break critical services for that
  system.

* When krb5 is a critical service, it's failure is going to be quite
  obvious regardless of whatever the maint script does.

* It is almost always the case that debugging  the situation involves
  installing some package and that  the first thing I end up doing is
  walking a user through adding exit 0 at the top of postinst in
  /var/lib/dpkg/info before going forward.  Even if  I don't need some
  additional tool, I've been burned by other parts of the system being
  in half-configured state.

* Leaving large chunks of the system in half-configured states is about
  one of the worst things you can do for system stability.  It's not
  something we test very often, and the interactions are very difficult
  to predict.

If I understood the cause of an error in a maintainer script and knew
that it indicated a problem that the sysadmin needed to fix (and one
that likely indicated krb5 was important on this system) I would be open
to returning a failure in postinst.
In almost all other situations I'd rather simply let the service fail to
start.

Attachment: signature.asc
Description: PGP signature

Reply to:

References:
- Bug#904558: What should happen when maintscripts fail to restart a service
  - From: Simon McVittie <smcv@debian.org>

Prev by Date: Bug#904558: What should happen when maintscripts fail to restart a service
Next by Date: Re: Bug#904558: What should happen when maintscripts fail to restart a service
Previous by thread: Bug#904558: What should happen when maintscripts fail to restart a service
Next by thread: Re: Bug#904558: What should happen when maintscripts fail to restart a service
Index(es):
- Date
- Thread