Re: System-critical package management
Hello,
The lack of any system of recognition for packages that are critical to system operation impedes the reliability of Debian-based systems. For example, a reboot during a background package upgrade process on critical system packages unbeknownst to the user may result in the system unable to boot as expected, with little readily-available feedback to the user as to the cause.
Locking out reboots while the package manager is active is a policy that
needs to be provided by the policy layer that allows ordinary users to
reboot -- so this is the responsibility of the desktop environment.
The base system and package manager require superuser privileges for
both reboot and invoking the package manager. For single-user systems,
it is the responsibility of the administrator to not issue a reboot
command while a package upgrade is in progress, which is not an onerous
requirement because the package upgrade must be manually commanded as well.
Packages are often installed in environments where no control over
reboots is possible and where system services usually found on desktops
are unavailable, such as inside containers during preparation of
container images.
There is no appropriate place to implement such a lockout at a low
level. The kernel is informed of the intention to reboot only after
system shutdown is complete, so this is the wrong place, and above that,
users have a choice of different policy layers that fit their use case
best, including "none".
But: because background updates on desktop systems are implemented as a
system service that is run through a policy layer, it is possible to
implement such a lockout on this layer.
Other operating systems like Windows and MacOS manage this by updating system-critical components separately from user-land during shutdown, while clearly giving user-feedback that critical updates are taking place, and that for example the system should not be turned off.
No, these systems make no distinction between system and user
components. The reason upgrades are performed through a reboot is a
historical shortcoming in the file system implementation: Unix separates
the contents of a file from its file name, so if a file is open, its
name can still be changed or removed, while the file contents are kept
until no more names point to it *and* no more open file handles exist.
On Windows, open files cannot be renamed or deleted unless the program
has specifically allowed this, which (for historical reasons) few
programs do, so the upgrade process works by unpacking the new files to
a temporary name, making a note to rename the files, then rebooting and
performing the rename while no users are logged on and no services are
running, and then subsequently starting the system.
This process is the same even for user programs, so if you update WinRAR
while it is open (so the file cannot be updated), the installation
process will ask for a reboot to complete the upgrade.
A potential middle-ground solution to this is to allow packages to be marked as "system-critical" to DPKG by external system components - for example a standard desktop Ubuntu system might mark the Gnome Display Manager, Networking drivers, and others in this way during installation. These system-critical packages could then be protected by DPKG in the following ways:
- They are automatically reverted to a known good state on upgrade failure (e.g. previous version)
Generally, packages are expected to go from one functional state to
another in a very quick operation after verifying that the operation can
be performed.
For example, grub installed into the MBR will check that all components
are present, prepare the image to be written in memory, and only in the
last step, write the first and second stage bootloaders in one go. Any
failure at this stage would be "hardware error", which would also apply
to the old version, and until that point, the old version would still work.
It is much more likely for a package to indicate success and
subsequently fail on reboot because of a missing check, but this is not
something the package manager can help with.
What already exists is automatic revert if a package fails to unpack
because of an I/O error (or the disk being full).
- They cannot be removed without being unmarked as "system-critical"
We have "Essential: yes", which dpkg protects, and "Protected: yes",
which are protected by apt. The latter category is what bootloaders fall
in (it also helps that the main author for apt is also a grub maintainer).
The dpkg program will allow you to remove the bootloader, because that
is what allows changing bootloaders easily, the "Essential" set is
basically just what is required for dpkg to function -- so dpkg cannot
self-destruct.
- The system could check during every shutdown that system-critical packages are in a consistent state, reverting to a known good state if not
Again, this would need to be inside the policy layer that defines
"shutdown" -- there are many of those, and most of them are outside the
Debian system (e.g. if you run Debian in a container under Kubernetes,
then Kubernetes is the policy layer that would be responsible for that.
On desktop systems, systemd is the appropriate policy layer to decide
about reboots, and (if I remember correctly) packagekit is the policy
layer that invokes dpkg, so packagekit would need to inhibit reboots
while it is working, and it can do so easily because it can assume
systemd to be present and running.
I am interested in knowing the communities' thoughts on this, and if these ideas have any merit to them.
On the lower levels, what can be reasonably implemented already is. The
lockout you describe belongs into the desktop system, but it would
require new UI to be developed to be useful -- rejecting the reboot is
easy, but indicating to the user why the reboot was rejected or
disabling the option requires a new communication channel, and without
that functionality, the user experience would be "I tried to reboot and
it didn't do anything."
Breaking the layer separation would be a horrible complicated mess --
adding new low level errors means adding appropriate error handlers to
all intermediate layers until the error can bubble up to the user. This
is something component systems have historically struggled with -- every
time Windows displays some "error code c0312313" type dialog, this is a
missing handler chain.
Simon
Reply to: