[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Package dependency versions and consistency



On 19.12.20 01:25, Josh Triplett wrote:
Jonas Smedegaard wrote:
Quoting Raphael Hertzog (2020-12-17 13:16:14)
Even if you package everything, you will never ever have the right
combination of version of the various packages.

What is possible to auto-compute is a coarse view of the work needed.

In reality, most Nodejs modules declare too tight versioning for their
dependencies, and in many cases it is adequate that a module is packaged
even if not at the version declared as required.  A concrete example is
"ansi-styles" which is most likely working just fine in version 4.x.

This is not at all as simple as it sounds, even on a small scale, let
alone when multiplied by a few hundred dependencies.

(Let's please not go on the standard tangent into complaints about
the number of dependencies, because at the end of that tangent, people
will still use fine-grained packages and dependencies per the standard
best-practices of those communities, no matter the number or content of
mails in this thread suggesting otherwise. The extremes of "package for
a one-line function" are not the primary issue here; not every
fine-grained dependency is that small, and the issues raised in this
mail still apply whether you have 200 dependencies or 600. So let's take
it as a given that packages *will* have hundreds of library
dependencies, and try to make that more feasible.)

Figuring out whether those dependencies are actually too specific or if
they're required is a substantial amount of work by itself; the
packaging metadata and dependency versions recorded upstream exist to
declare the required version of dependencies, and there isn't typically
a *second* way that upstream records "no, really, there's a reason for
this dependency version requirement". This is hard enough in a
statically typed language, where you can at least have the verification
of seeing if it compiles with the older version (though the package
might be relying on new semantics); with a dynamically typed language,
you might not know that the older version of the dependency has caused a
problem until runtime. As an upstream developer, the safest assumption
when preparing your own dependencies is "well, it works with the version
of the dependency I tested with, and assuming that component correctly
follows semver, it should work with newer semver-compatible versions".

To clarify something: I *don't* believe Debian should compromise on
network access at build time. Debian package dependencies should be
completely self-contained within the Debian archive. The aspect I'm
concerned about here is that Debian pushes hard to force every single
package to use *the same version* of a given dependency, even if the
dependency has multiple incompatible versions (properly declared with
different semver major numbers, equivalent to libraries with different
SONAMEs). I'm not suggesting there should be 50 versions of a given
library in the archive, but allowing 2-4 versions would greatly simplify
packaging, and would allow such unification efforts to take place
incrementally, via transitions *in the archive* and *in collaboration
with upstream*, rather than *all at once before a new package can be
uploaded*.

(I also *completely* understand pushing back on having 2-4 versions of
something like OpenSSL; that'd be a huge maintenance and security
burden. That doesn't mean we couldn't have 2-4 semver-major versions of
a library to emit ANSI color codes, and handle reducing that number via
incremental porting in the archive rather than via prohibition in
advance.)

I think much of our resistance to allowing 2-4 distinct semver-major
versions of a given library comes down to ELF shared libraries making it
painful to have two versions of a library with distinct SONAMEs loaded
at once, and while that can be worked around with symbol versioning,
we've collectively experienced enough pain in such cases that we're
hesitant to encourage it. Our policies have done a fair bit to mitigate
that pain. But much of that pain is specific to ELF shared libraries and
similar. And some of our packaging limitations are built around this
(e.g. "one version of a given package at a time"), which in turn forces
some of those same limitations onto ecosystems that don't share the
problems that motivated those limitations in the first place. The
dependency and library mechanisms of some other ecosystems, are designed
to support having multiple distinct versions of libraries in the same
address space, with fully automatic equivalents of symbol versioning.

In Debian packaging, this issue typically results in one of three
scenarios for every dependency (recursively):

- Trying to port the package to work with older versions of
   dependencies. This incurs all of the burden mentioned above for
   determining if the older dependency actually suffices. On top of that,
   this may involve actual porting of code to not rely on the
   functionality of newer versions, which is very much wasted effort
   (that functionality was added so that it could be used, and avoiding
   it often entails duplicating that functionality). In some cases, such
   porting may render the package incompatible with newer versions
   (especially if porting to an older semver-major version). In most
   cases, such changes are something upstream will generally not care
   about at all, for all of these reasons. Going backwards is not the
   ideal direction, but people sometimes do it anyway because the
   alternative can be even more painful:

- Trying to package a newer version of the dependency in Debian. This
   will often cascade recursively into multiples of the same set of
   problems over again, both downwards through the dependency tree for
   the dependencies of your dependencies, and upwards through other
   packages' dependency trees. Packaging distinct semver-major
   incompatible versions in separate packages would make this much easier
   and avoid recursively forward-porting all the packages depending on
   the same dependency, but as mentioned above, there's a noticeable
   resistance to packaging multiple incompatible versions of a library.
   And in addition, every round of such work often entails substantial
   archive delays, trips through NEW (which can be relatively fast with
   the impressive work that ftpmasters do to stay on top of it, but it
   still may mean repeatedly pausing your packaging work), and the risk
   of inconsistent pushback on incompatible requirements like "don't
   bundle things" versus "bundle these things together because they're
   tiny".

- Just bundle it, skip all that pain, cross your fingers, and upload.
   This *is* unfortunate, and I'm not arguing that bundling is the ideal
   solution. Bundling results in multiple semver-compatible versions of
   the same library in the archive, rather than a few semver-incompatible
   major versions of the library. But one major reason people bundle
   dependencies is to skip all of the above problems.

So, even assuming every package involved uses semantic versioning
*perfectly*, there's a great deal of work to do. And 100% of that work
has to happen *before* the first upload of the package.

Right now, Debian pushes back heavily on bundling, and *also* pushes
back heavily on all of the things that would solve the problems with
unbundled dependencies. That isn't sustainable. If we continue to push
back on bundling, we need to improve our tools and processes and
policies to make it feasible to maintain unbundled packages. Otherwise,
we need to build tools and processes and policies around bundled
dependencies. (Those processes could still include occasional
requirements for unbundling, such as for security-sensitive libraries.)

I've never seen an ELF shared library package rejected on the basis of
"this is a tiny library, you must bundle it together with other tiny
libraries"; on the contrary, for reasons such as multiarch it's often
*necessary* to split out such libraries into separate binary packages.
On top of that, it's possible to do shared library transitions in
unstable in several different ways, aided by the testing migration
process. You can upload libfoo5, port packages individually over from
libfoo4 to libfoo5, and it's potentially *acceptable* for the
intermediate state of libfoo4 and libfoo5 coexisting to persist for a
while, as long as you take some care to avoid linking both into the same
binary (or carefully use symbol versioning, which most libraries don't).

Debian Policy provides a *huge* amount of value in some of the ways it
constraints software builds: requiring that all dependencies (including
build dependencies) be Free Software, in restricting network access at
build time, and other similar ways we maintain a self-contained archive
of Free Software. However, I think there are a few specific ways we
could make it easier and more common for people to *not* bundle
dependencies:

- End the practice of pushing back on small packages that package each
   dependency in one source package and one binary package. It's hard
   enough to solve all of these problems without also needing to throw a
   pile of upstream packages into one Debian package. If we have issues
   with the size of Packages files, let's introduce the idea of archive
   sections solely for self-contained build dependencies that most people
   don't need to have in their sources.list. But let's allow packages
   from *all* ecosystems, regardless of size, to be able to take
   advantage of at least the level of support we provide for ELF shared
   libraries.

- Allow packages to have multiple semver-major versions in the archive
   simultaneously (e.g. lang-modname-4, lang-modname-5, lang-modname-6),
   as long as the type of package supports such coexistence. This may
   also require some package tooling work to allow coexistence among
   binaries that are commonly used as build dependencies, but a
   prerequisite for such work is knowing that the resulting packages will
   not get rejected from the archive.  We can have a "should"-level
   policy that suggests working with various upstreams across an
   ecosystem to reduce the number of versions needed simultaneously, and
   we could have tooling to help with such transitions, but those are
   things we can handle incrementally in the archive. We can also have a
   policy about pushing back on proliferating versions of
   security-sensitive packages, but that should be for crypto packages or
   packages with a history of regular security advisories, not for the
   majority of packages.

- Simplify the process of uploading new semver-major versions of
   packages, without having to wait for NEW. This is especially true if
   the package has already been through NEW at least once. But we need to
   solve the case of a new source package for the new semver-major
   version, as well. (For instance, perhaps if the same maintainer of the
   source package lang-modname-5 is uploading a source package for
   lang-modname-6, that package can skip NEW.) We can always file RC bugs
   on packages in the archive, and even remove packages later.

Given all of the above improvements, it'd be much more feasible for
tooling to help systematically unbundle and package dependencies, and to
help manage and transition those dependencies in the archive.

Very thoughtful and thorough exposé! +1 thank you!
*t


Reply to: