[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RFC: "Recommended bloat", and how to possibly fix it



Hello, and thanks for your time.

I've been a Debian user and contributor for a while, and have noticed a
rather frustrating issue that I'm interested in potentially
contributing code to fix. The issue is what I call "Recommended bloat",
which in short is what happens when you install a package with all of
its recommended packages, and end up with a whole lot of stuff installed
that you don't want and that the package you actually wanted probably
didn't even need.

According to the Debian Policy Manual, section 7.2, the Recommends
field in Debian packages "declares a strong, but not absolute,
dependency. The Recommends field should list packages that would be
found together with this one in all but unusual installations." While
this is a very useful definition, the actual way in which Recommends
are used seems to differ substantially from this.

More often than not, it seems packagers treat the Recommends field as a
place to list things that aren't dependencies but that the packager,
for whatever reason, wants installed alongside the package they're
working on. This results in packages being installed that many users
may consider entirely useless or even detrimental, but that nonetheless
end up installed as if they were strong-but-not-absolute dependencies.

My favorite package to "pick on" when illustrating this is
"diffoscope". Its job is to do "deep comparisons" of tons of different
file types. Realistically, a user who's using diffoscope will only want
a subset of its functionality (the person who wants to compare kernel
builds may not care about the ability to compare Android APKs, for
instance). Yet if you try to install diffoscope with a simple `sudo apt
install diffoscope`, it tries to pull in an entire army of
various packages, including `supermin`, `mono-runtime`,
`qemu-system-x86`, `syslinux`, `default-jre`, `r-base-core`, and
several Android-related packages. These are all pulled in because of
diffoscope's rather overkill set of recommends, and IMO it's rather
hard to say that all of these things would be found together with
diffoscope in all but unusual installations.

If this was just a diffoscope problem, it would be easy to just file a
bug asking that most of these packages be demoted to Suggests, but this
is a much more pervasive issue, as evidenced by the fact that the
live-build manual has special instructions for how to disable the
installation of *all* recommended packages when building a live
image[1]. I have built live images that ended up with all sorts of
weird packages installed on them, which issue was resolved by disabling
the installation of recommended packages.

The issue with resorting to things such as `apt install
--no-install-recommends` is that the Recommends field is not always
used as a catch-all for packages that the packager wants installed by
default. Sometimes it is used in a spec-compliant manner, and in those
instances the use of `--no-install-recommends` and similar features can
cause serious problems. The aforementioned live-build manual mentions
this, warning that one probably should explicitly install `user-setup`
and `sudo` in images that they build if they want things to work
properly. While there are situations in which one legitimately doesn't
want even those packages, those situations are relatively rare - as the
Debian Policy Manual states, these packages are needed in all but
unusual installations.

Furthermore, the current (ab)use of Recommends in Debian packages
illustrates something important - there is a real need for packagers to
specify packages that should automatically be installed alongside
another package, but that aren't necessarily strong dependencies. Using
diffoscope again as an example, it's reasonable that the diffoscope
maintainers want *all* of diffoscope's functionality to "just work"
out of the box, even if that means installing over three and a half
gigabytes of packages to do it.[2] This may not be policy-compliant,
but demoting these packages to the "Suggests" field doesn't feel right.
Should a user who just wants to compare things have to figure out the
right combo of packages to make diffoscope work for their particular
use case?[3] There's also the question of logistics - going through and
"fixing" all of the packages with overkill Recommends could be a
massive undertaking, and it's more than likely that there will be some
packagers who won't be thrilled with the changes.

With all this in mind, I'd like to call some attention to a feature
request made by Patrick Schleizer some time ago, whom I've copied on
this email. The feature request suggests the addition of a new field to
Debian's binary dependency relationship fields, "Weak-Depends".[4] In
Patrick's own words, Weak-Depends "[d]eclares a weak dependency. Most
users of this package may benefit from installing packages listed in
this field, but can have reasonable functionality without them." The
exact way in which this would be implemented is that Weak-Depends
packages would get installed when using `apt install
--no-install-recommends package`, but any package listed there could be
removed without removing the package which referenced the Weak-Depends.

An astute reader will notice that this sounds an awful lot like what
the Recommends field is supposed to be for... because it is! This leads
into the second half of the change I would like to suggest making,
which is that the definition of "Recommends" in the Debian Policy
Manual be changed to reflect how it is actually used. "Recommends" is
used to declare packages which provide useful benefits when installed
alongside another package, and is typically used to make it so that
installing one package results in the installation of another package.
That's what its definition in Debian policy should reflect.
"Weak-Depends" would then basically take over the role "Recommends"
was originally intended to play. The policy manual would include a note
on "Weak-Depends", stating something like "if the package declaring the
dependency is still reasonably useful when the dependency is removed,
carefully consider if Recommends is a better fit here". Recommends
would also have a note such as "if the package declaring the dependency
is rendered significantly impaired when the dependency is removed,
carefully consider if Weak-Depends is a better fit here". This is
conceptually similar to the difference between Breaks and Conflicts,
where both of them will keep two packages from being fully installed at
the same time, but one is "stronger" than the other and the two have
slightly different behavior. Weak-Depends would basically just be a
stronger "Recommends".

Obviously, there are still edge cases where a user may want all
optional packages, even Weak-Depends packages, to stay out of the
picture. For these scenarios, a `--no-install-weak-depends` flag could
be added to apt, and dpkg could be made to gripe but not fail if the
user tries to install a package without its Weak-Depends being
installed alongside.

Some of the advantages of this solution include:

* It requires comparatively few changes to initially implement. All
  existing packages in the Debian repository will be compliant with a
  Debian Policy Manual update that adds Weak-Depends, without changes.
  Packagers can start using Weak-Depends if they want to or if a bug
  report requests that they do. Some of the packages that would need to
  change to implement this would be dpkg, apt, possibly the libraries
  they depend on, and live-build. There's probably others here, but
  nonetheless it wouldn't require a massive overhaul of the whole
  archive to make it start working, the way a mass-"demote to Suggests"
  operation would.
* If widely used across the Debian archive, people who want to avoid a
  bloated system without ending up with a severely wonky system can
  use `--no-install-recommends` to install packages, yet still end up
  with packages that they really needed to have but probably didn't
  know they needed.
* The default behavior of apt will still be to install recommended
  packages by default, meaning that almost nothing will change from a
  user standpoint when running `apt install package`. There won't be
  any weird surprises like "why does installing diffoscope only install
  the core engine but leave out everything that's needed for diffoscope
  to do anything useful?"

So, in summary, I would like to suggest the following changes, to help
resolve the "Recommended bloat" issue.

* Add a "Weak-Depends" field to the list of binary dependency control
  fields in the Debian Policy Manual section 7.2, with a definition
  very similar to the existing definition for "Recommends".
* Change the definition of the "Recommends" field to match the way
  the field is oftentimes used in the Debian archive.
* Add support for the "Weak-Depends" field to apt, dpkg, and other
  packaging-related tools that need it (such as aptitude, live-build,
  and potentially others). This support would include recognition of
  the field in control files, non-fatal warnings in dpkg when a
  Weak-Depends is missing, and the addition of
  `--no-install-weak-depends` options to allow people to skip
  installing them.
* Suggest that all active package maintainers in Debian review the
  packages they maintain and see if they feel there are some packages
  that should be promoted from "Recommends" to "Weak-Depends".
* (Potentially?) Scan the Debian archive and see if there are
  dependencies that should be promoted from "Recommends" to
  "Weak-Depends". This would probably be something that interested
  Debian Developers and other volunteers could slowly chip away at, as
  they had time and the desire to do so.

Thanks for taking the time to read this!

[1]: https://live-team.pages.debian.net/live-manual/html/live-manual/customizing-package-installation.en.html section 8.4.3.
[2]: On my Debian Bookworm XFCE machine, `sudo apt install diffoscope`
     reports "After this operation, 3,624 MB of additional disk space
     will be used."
[3]: This particular situation could probably be fixed with the help of
     some metapackages, but that's beside the point.
[4]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=942303

Attachment: pgp9lSw2LOCRL.pgp
Description: OpenPGP digital signature


Reply to: