[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Split Packages files based on new section "buildlibs"



Hi everybody,

Short Reason: Too many packages of no use to our users.

Longer reason: Many packages get added to Debian that are of no (direct)
use to our users. Each package adds metadata to the indices that needs
to be downloaded, processed by tools and also clutters up the whole
package list for no practical benefit. A split out packages file will
allow us to minimise the effect on users.

More and more packages are being uploaded into the Debian archive which
are only ever used for building packages. These are not only never
intended to be installed onto an end-user's system, they are even
actively discouraged from being used directly by a user. The two
currently most notable examples are packages used by the Go and Rust
programming languages and their ecosystem, but there well may be
others[1]. While we need their library packages to build the
applications that use them, they are entirely statically compiled and
none of the libraries will ever be installed on a normal user's system.

Moreover, the language ecosystem in Debian actively discourages users
from installing them for anything other than rebuilding a Debian
package. If you do general (non-Debian-specific) development using Go or
Rust on your machine, the expectation is that you will use the
language-specific tools to install your dependencies [2].

Currently however, all of those packages end up in the indices we
generate, which users have to download and package managers have to read
and deal with. Each of those packages therefore slightly increases the
size of these indices for little reason and while many users have access
to large bandwidth connections and fast CPUs, that is not the case for
many other users and does not benefit global warming.

For the Rust ecosystem, those sizes increase even more, as each of their
libraries can provide multiple features. For example, a TLS library can
link against GnuTLS or OpenSSL or some other random TLS implementation.
Such features may even be combined in various different ways, resulting
in an excess number of possible feature combinations for one Rust
package, called "crate". Those are "mapped" to the Debian package world
by creating something we call *feature packages*, with one such feature
package per feature or combination thereof (usually grouped by common
dependencies).

Those feature packages are empty packages, only containing a symlink for
their /usr/share/doc/… directory. Their size is smaller than the
metadata they will produce. Adding new features means one more trip
through the NEW queue each time such new binary packages are introduced.

The FTPTeam disagree with the feature-package solution[3], so currently
there is a workaround. By collapsing the features into the main library
package and declaring the features using the Provides header similar
functionality is achieved. However this doesn’t work in all situations,
for example:

   Tools can generate really long Provides: lines, with the current
   record being around 250kb. That's long enough that a tool (not dak
   itself) broke on it already. And those lines may grow larger in
   future.

   Some features may need different (build-)dependencies (say, GnuTLS
   vs OpenSSL), should those conflict with each other, you cannot
   combine them into one package and must fall back to the feature
   package solution.

   Generally, the workaround involves changing upstream's dependency
   structure in order to fit it into the aforementioned Debian
   constraints, and so of course this may not always play nicely with
   other packages that expect the unchanged upstream dependency
   structure. The feature-package solution is a 1-to-1 mapping.

There have been multiple discussions between the FTPTeam and the Rust
package maintainers. The FTPTeam does not want those feature packages in
the part of main downloaded by users and currently rejects them from
NEW, while the Rust maintainers see them as needed and the workaround as
just that. Both sides agree that this is not a productive and
sustainable solution and that we need to agree on something better.

The current proposal is to reduce the main Packages.xz files size by
splitting[4] out all of the packages that are not intended for users,
writing those into an own file. Those packages would have a section of
"buildlibs", independent of their other properties. That section should
only be activated on buildds and in situations that need
build-dependencies available (say, an archive rebuild, a user rebuilding
packages that need Build-Dependencies from there), but not by default
anywhere else. This section will allow feature packages and *may* even
let them bypass binary-NEW if they only add new feature (empty)
packages.

The exact way of how this gets implemented, both in dak and also apt, is
still being discussed between the ftpteam and the apt maintainers. We
have ideas from writing out section based packages files to presenting
it as a subcomponent to main, and we think we will have something
finalized pretty soon. It possibly needs small changes on the side of
release managers, wanna-build admins or other tools that need to read
the full Packages information, we will provide more information on that
when we are sure about the changes.

Advantages of this approach are that the mechanism by which we assign
packages into the buildlibs file instead of the main file are flexible
on the archive side. Whilst we intend to use the package section
initially, this is a policy decision which can be altered without
clients needing to update. We also have the ability, should it ever be
necessary, to add other indices files where it makes sense.

For the timeline for this change: We hope that this will be ready before
bullseye (especially if we would end up needing a patch in apt), so that
after the release we could gradually switch to split Packages files.

Footnotes
-------------
[1] The focus currently is on Rust, as it has the most pressing need to
   resolve the issue. We know that the new section may also be useful
   for Golang, and we know something of how that is currently handled.
   This is, however, definitely not limited to just those, if you think
   that your package set is a good candidate to move here, please get
   in contact with us.

[2] The go get for Golang, cargo build for Rust ways.

[3] While the trip through NEW for basically nothing is annoying, the
   real problem is the metadata size.

[4] We first thought about an entire new archive, but that is much more
   separate, creating a higher workload on maintaining it.
   Additionally, it would create problems following the licenses of
   packages. Then we thought about a new component besides
   main/contrib/non-free, and while that works better, it still has
   many negative side effects including requiring extra package
   uploads, extra tracking for the release team and requiring multiple
   components if we later decide that we need to support this for all
   of main, contrib and non-free.

--
bye, Joerg

Attachment: signature.asc
Description: PGP signature


Reply to: