[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Second take at DEP17 - consensus call on /usr-merge matters



Hi,

mmdebstrap author here. This is the other bootstrapping tool which is currently
sitting at ~17% of the popcon value of debootstrap.

Quoting Helmut Grohne (2023-06-28 21:37:44)
> Once that is settled, the next big question is how to handle bootstrapping.
> We had a number of people arguing in favour of changing the bootstrap
> protocol. Such changes can be classified into generic changes and
> release-dependent changes. A release-dependent change enhances bootstrapping
> tools with knowledge of available releases and adapts behaviour in
> release-specific ways that are encoded into the bootstrapping tool. As it
> stands, the only bootstrapping tool that has been enhanced in this way is
> debootstrap and that support is limited to Debian and two derivatives. The
> category of generic changes includes imposing an ordering on initial unpacks
> (e.g. base-files first). Others are in favour of not changing the bootstrap
> protocol. In that view, some data.tar will have to ship the symbolic links
> and all other essential packages need to have their files canonicalized.
> 
> Among these options, the first has a working prototype (debootstrap),
> but it is unclear how that extends to use of snapshot.d.o and how to
> make it work with debootsnap and debbisect as those tend to use a suite
> name rather than a codename. The last option has a prototype and relies
> on uploading a number of essential packages in a short window of time.
> (What could possibly go wrong?)
> 
> It is not clear to me how we can get to a consensus on these, so the
> best I can do here is summarize options.
> 
> Option #3a:
> 
>     The bootstrap protocol shall be changed to contain a task for
>     merging /usr as is done in debootstrap in a release-dependent way.
> 
> This is an instance of M16 from DEP17.
> 
> Option #3b:
> 
>     The bootstrap protocol shall be changed in unspecified ways to
>     support the /usr-merged systems in a way that does not depend on
>     matching the codename or suitename.
> 
> This is an instance of M16 from DEP17.
> 
> Option #3c:
> 
>     The bootstrap protocol shall remain unchanged. Therefore, all
>     essential packages need to move their files out of aliased locations
>     and the aliasing symlinks are to be installed from a data.tar of a
>     binary package such as base-files.
> 
> This is M2+M11 from DEP17.
> 
> While a few people including Marco d'Itri and Sam Hartman have argued in
> favour of exploring the space of #3b, no proposals have emerged in the
> interim. The proposal in #3a has three significant limitations:
>  * It creates compatibility issues when combining old a new suites
>    unless changes to bootstrapping tools are backported to older
>    releases.
>  * It becomes a whack-a-mole, since we need to add codenames of every
>    derivative to every bootstrapping implementation.
>  * It breaks bootstrapping from snapshot.d.o and therefore breaks tools
>    such as debbisect and debootsnap.
> 
> While the first of these limitations is shared with #3b, the others are
> not and that would make #3b more attractive to me if there was a
> concrete proposal to evaluate. The one about unpacking base-files first
> seemed the most concrete to me, but it has the downside of imposing a
> permanent cost on bootstrapping tools even though we only need that
> behaviour temporarily, which seems like too bad of a trade-off to start
> with in my opinion. Did I miss a relevant proposal for modifying the
> bootstrap protocol?
> 
> On the flip side, there is a demo for #3c showing that we can move most
> of the things except for a hand full of packages and then flip the
> switch (for bootstrapping) in unstable by uploading those packages
> simultaneously. The biggest downside of this probably is the inherent
> fragility of this approach. Even if this is extensively tested before
> uploading chances are good that we still break something unforeseen in
> the process.
> 
> Can I get more feedback from those who rather not have #3c implemented as to
> how you see things moving forward?

there was some feedback of people who'd rather not have #3c implemented. I
agree with them that changing bootstrapping tools is a quick and easy fix that
only requires a few lines of easily testable code, can be deployed quickly and
is unlikely to break unrelated things in a serious way.

In this email I'd like to explain why I think that #3a and #3b are not good
ways forward for the distribution. In summary, what I fear is, that choosing
either #3a or #3b would mean choosing a simple fix just because it's simple and
ignoring the fact that those solutions mean a permanent cost for the
distribution (even beyond debootstrap) for many years to come. What I'd like
Debian to choose would be the technically and architecturally proper solution
even if it means more work but results in multiple long-term benefits. To
expand further on why I think this way, let me go back in time a bit.

Around 7 years ago I started being intrigued by how multistrap was able to
create a chroot using apt. This had many benefits over using debootstrap, first
and foremost having access to more powerful dependency resolution and being
able to use multiple mirrors at once. My thanks here go to Neil Williams who
wrote the tool and to the apt developers who make using apt in that way
possible.  I became the multistrap maintainer in 2016 and tried very hard to
remove its limitations:
https://gitlab.mister-muffin.de/josch/multistrap/commit/ff96767b2f6a574e2651768225ad61557880e12f

When I realized that I couldn't fix multistrap without breaking its interface I
started a new project I called mmdebstrap in 2018. In comparison to debootstrap
it's faster, supports multiple mirrors, produces bit-by-bit identical output if
SOURCE_DATE_EPOCH is set, allows for chroots containing only Essential:yes or
even less than essential, supports foreign architectures out-of-the-box, does
not require superuser privileges to run and has an interface very familiar to
debootstrap users. Those are all nice properties but my main motivation to
write this tool was a different one. I wanted to show that it's possible to
create a Debian chroot without all the special-casing that debootstrap uses.
The only special-casings I added to mmdebstrap were those where it was clear
that those special-cases can be moved to either apt or dpkg instead. For me,
mmdebstrap is a testing ground to improve the bootstrapping protocol in a
direction that moves all the magic from the bootstrapping tool into either dpkg
or apt or into the packages themselves.

This was a success. The mmdebstrap tool is proof that starting with Debian
Stretch, no special-casing like special unpack order is needed to create a
working Debian chroot.  For distributions older than Stretch, mmdebstrap ships
the maybe-jessie-or-older hook which adds all the hacks that were still
required back then.

The property was broken when it became apparent that it was generally agreed
that the way merged-/usr is set up by debootstrap creating symlinks before
unpacking packages was the way forward. Only later we suddenly realized that
debootstrap doing this setup was not enough. We actually want to ship the
functionality describing how the filesystem should look like as part of the
Essential:yes set. This happened in 2022 when init-system-helpers gained a
dependency on "usrmerge | usr-is-merged" to automatically do the conversion
upon upgrades.

I'd like to draw a parallel between the situation back then and the situation
we are in today. Back then it was decided that just using debootstrap to setup
merged-/usr was the right way forward. Very few lines of code were added to
debootstrap doing the desired thing because it just got the job done. Later we
realized that it did not get all the job done and it actually is necessary to
ship the functionality as part of the (transitive) essential package set.

This gets me to the core of my argument: the way that a Debian chroot should
look like should be described by the packages that get installed into the
chroot and not by an outside tool. Yes, an outside tool is needed to do the
installation but that tool should rely on the package metadata to make choices
instead of encoding timestamp or release name dependent codepaths.

I agree that the bootstrapping use-case is a very special one and we would not
have this argument if there was no other way to implement this other than hard
coding special cases. But there is.  We could have all the nice properties that
come with encoding the layout in the packages that get installed instead of
shipping that information in one or multiple tools.

To illustrate this, let me draw a comparison: if the bootstrap protocol would
encode timestamp or release-specific codepaths, then this would be like apt or
dpkg encoding timestamp or release-specific codepaths. Of course nobody argues
that apt should hard-code dependencies of packages if they match certain
versions.  That information should be included in the packages themselves.
Doing it like that and encoding the information about how packages work
together in the packages themselves as part of their metadata is at the core of
developing a component based operating system like Debian. The mmdebstrap tool
proves that it is possible to do bootstrapping like that. The way merged-/usr
expected things to work broke it but we can go back to having this property
again.

Additionally, I think we are generally in agreement that maintainer scripts
should become more declarative instead of being Turing-complete shell scripts.
I am also a big fan of the fact that systemd replaced init shell scripts with
declarative metadata. I'm sure systemd maintainers would also not like encoding
any special casing of the sbuild/buildd service because that information should
go into sbuild/buildd instead. If we are moving in that direction in all other
parts of Debian, can we also choose a declarative approach for the
bootstrapping scenario? It worked before merged-/usr (as shown by mmdebstrap)
and we can have this again by choosing #3c.

Choosing #3c gets us more than just a simple and clean design. Encoding the
information of how a chroot should look like in the packages instead of the
bootstrapping tool allows creating chroots for Debian unstable all the way back
to 2006-08-10 using debbisect or debootsnap. Yes, creating old chroots via
intermediary chroots is possible but it wastes processor cycles, adds
complexity and requires hardcoding timestamps in the tools doing the job
automatically. Letting the Essential:yes packages and their dependencies decide
how a chroot is supposed to look like is also friendly to our derivatives as
they then no longer need to maintain their custom setup in a tool like
debootstrap. Choosing a component-based view on the bootstrapping problem does
not only give is a clean design but also desirable properties for creating
either old chroots from snapshot.d.o or chroots for derivatives without
requiring hardcoding things over and over again in several tools.

I fear that we are sacrificing the benefits we get from using the component
based approach to software engineering. We are tempted by a quick-to-implement
solution to get things done now without having to think much more about it and
silently accept the long term costs for all tools in the bootstrapping space.

As this mail hopefully has made clear, I'm highly in favour of option #3c. I'm
also absolutely prepared to do the work of setting up all the CI environments
required to get the NMU patches done that are needed to make this happen. My
work on mmdebstrap as well as my work on the CI environment and patches
surrounding chrootless dpkg should prove that I'm capable of doing this. Yes,
this will be a lot more work *now* but I think it can be done and I am
absolutely willing to do it if I'm not getting blocked by the project on that
work. The extra work I'm doing now will prevent more work from being necessary
later and it will get us a clean architectural design for bootstrapping a
Debian chroot for many years and releases to come.

I think this now comes down to what we as a project care about. Which use-cases
are important to us? Which properties do we want to preserve?

What do you think?

Thanks!

cheers, josch

Attachment: signature.asc
Description: signature


Reply to: