Re: System-critical package management

To: Peter Warrington <sothisispeter@gmail.com>, "debian-dpkg@lists.debian.org" <debian-dpkg@lists.debian.org>
Subject: Re: System-critical package management
From: Simon Richter <sjr@debian.org>
Date: Thu, 7 Sep 2023 11:59:47 +0900
Message-id: <[🔎] 1dfdab24-be82-e3da-0c4d-38f43ae29182@hogyros.de>
In-reply-to: <[🔎] LO0P123MB64630EE1336E8D021CC4DE3BA4EFA@LO0P123MB6463.GBRP123.PROD.OUTLOOK.COM>
References: <[🔎] LO0P123MB64630EE1336E8D021CC4DE3BA4EFA@LO0P123MB6463.GBRP123.PROD.OUTLOOK.COM>

Hello,

The lack of any system of recognition for packages that are critical to system operation impedes the reliability of Debian-based systems. For example, a reboot during a background package upgrade process on critical system packages unbeknownst to the user may result in the system unable to boot as expected, with little readily-available feedback to the user as to the cause.

Locking out reboots while the package manager is active is a policy thatneeds to be provided by the policy layer that allows ordinary users toreboot -- so this is the responsibility of the desktop environment.

The base system and package manager require superuser privileges forboth reboot and invoking the package manager. For single-user systems,it is the responsibility of the administrator to not issue a rebootcommand while a package upgrade is in progress, which is not an onerousrequirement because the package upgrade must be manually commanded as well.

Packages are often installed in environments where no control overreboots is possible and where system services usually found on desktopsare unavailable, such as inside containers during preparation ofcontainer images.

There is no appropriate place to implement such a lockout at a lowlevel. The kernel is informed of the intention to reboot only aftersystem shutdown is complete, so this is the wrong place, and above that,users have a choice of different policy layers that fit their use casebest, including "none".

But: because background updates on desktop systems are implemented as asystem service that is run through a policy layer, it is possible toimplement such a lockout on this layer.

Other operating systems like Windows and MacOS manage this by updating system-critical components separately from user-land during shutdown, while clearly giving user-feedback that critical updates are taking place, and that for example the system should not be turned off.

No, these systems make no distinction between system and usercomponents. The reason upgrades are performed through a reboot is ahistorical shortcoming in the file system implementation: Unix separatesthe contents of a file from its file name, so if a file is open, itsname can still be changed or removed, while the file contents are keptuntil no more names point to it *and* no more open file handles exist.

On Windows, open files cannot be renamed or deleted unless the programhas specifically allowed this, which (for historical reasons) fewprograms do, so the upgrade process works by unpacking the new files toa temporary name, making a note to rename the files, then rebooting andperforming the rename while no users are logged on and no services arerunning, and then subsequently starting the system.

This process is the same even for user programs, so if you update WinRARwhile it is open (so the file cannot be updated), the installationprocess will ask for a reboot to complete the upgrade.

A potential middle-ground solution to this is to allow packages to be marked as "system-critical" to DPKG by external system components - for example a standard desktop Ubuntu system might mark the Gnome Display Manager, Networking drivers, and others in this way during installation.  These system-critical packages could then be protected by DPKG in the following ways:

	- They are automatically reverted to a known good state on upgrade failure (e.g. previous version)

Generally, packages are expected to go from one functional state toanother in a very quick operation after verifying that the operation canbe performed.

For example, grub installed into the MBR will check that all componentsare present, prepare the image to be written in memory, and only in thelast step, write the first and second stage bootloaders in one go. Anyfailure at this stage would be "hardware error", which would also applyto the old version, and until that point, the old version would still work.

It is much more likely for a package to indicate success andsubsequently fail on reboot because of a missing check, but this is notsomething the package manager can help with.

What already exists is automatic revert if a package fails to unpackbecause of an I/O error (or the disk being full).

	- They cannot be removed without being unmarked as "system-critical"

We have "Essential: yes", which dpkg protects, and "Protected: yes",which are protected by apt. The latter category is what bootloaders fallin (it also helps that the main author for apt is also a grub maintainer).

The dpkg program will allow you to remove the bootloader, because thatis what allows changing bootloaders easily, the "Essential" set isbasically just what is required for dpkg to function -- so dpkg cannotself-destruct.

	- The system could check during every shutdown that system-critical packages are in a consistent state, reverting to a known good state if not

Again, this would need to be inside the policy layer that defines"shutdown" -- there are many of those, and most of them are outside theDebian system (e.g. if you run Debian in a container under Kubernetes,then Kubernetes is the policy layer that would be responsible for that.

On desktop systems, systemd is the appropriate policy layer to decideabout reboots, and (if I remember correctly) packagekit is the policylayer that invokes dpkg, so packagekit would need to inhibit rebootswhile it is working, and it can do so easily because it can assumesystemd to be present and running.

I am interested in knowing the communities' thoughts on this, and if these ideas have any merit to them.

On the lower levels, what can be reasonably implemented already is. Thelockout you describe belongs into the desktop system, but it wouldrequire new UI to be developed to be useful -- rejecting the reboot iseasy, but indicating to the user why the reboot was rejected ordisabling the option requires a new communication channel, and withoutthat functionality, the user experience would be "I tried to reboot andit didn't do anything."

Breaking the layer separation would be a horrible complicated mess --adding new low level errors means adding appropriate error handlers toall intermediate layers until the error can bubble up to the user. Thisis something component systems have historically struggled with -- everytime Windows displays some "error code c0312313" type dialog, this is amissing handler chain.


   Simon

Reply to:

Follow-Ups:
- Re: System-critical package management
  - From: Guillem Jover <guillem@debian.org>

References:
- System-critical package management
  - From: Peter Warrington <sothisispeter@gmail.com>

Prev by Date: Re: System-critical package management
Next by Date: Processing of debsig-verify_0.29_amd64.changes
Previous by thread: Re: System-critical package management
Next by thread: Re: System-critical package management
Index(es):
- Date
- Thread