Bootstrappable Debian - proposal of needed changes
the following is an email written by Wookey and myself.
The Debian bootstrap build ordering tool Google Summer of Code project
 was continued even after the summer ended and recently reached a new
milestone by being able to create a final build order from a dependency
graph  for Debian Sid.
By now, all important tools and algorithms have been written  to
solve the following problems:
- find source packages to which build profiles (reduced build
dependencies) should be added
- given enough source packages annotated with build profiles, generate
a final build order which produces a full Debian archive from zero,
starting from cross compiling a minimal system and natively compiling
the rest, breaking build dependencies as necessary (and as possible)
Since Debian source packages do not (yet) contain enough meta data to
decide whether or not a build dependency can be dropped, USE flags of
Gentoo source packages were harvested  . On top of that,
suggestions from Thorsten Glaser, Patrick McDermott and Daniel Schepler
were used. This way, our current results are hopefully not too far away
While the theoretical results do look consistent, this has so far not
been completed in practice due to the following open issues:
1. missing multiarch annotations prevent the multiarch cross build
dependencies of some source packages from being resolved correctly
2. not all source packages of the minimal build system are cross
compilable in practice yet
3. no decision has been made on the syntax of the new control fields
(build profiles) which are required for automated bootstrapping
4. not enough source packages implement build profiles (this depends on
3 being solved)
More details on this scheme are given at the DebianBootstrap wiki page
. Work has been going on for a couple of years on this, evolving as
practical experience was gained, and input taken from more people.
We therefore make the following proposals (field names not set in stone)
in descending order of importance for us:
The build profile format was proposed by Guillem Jover together with
other solutions he presented in this document  as part of bug#661538.
Build profiles extend the Build-Depends format with a syntax similar to
architecture restrictions but using < and > instead.
Build-Depends: huge (>= 1.0) [i386 arm] <!embedded !stage1>, tiny
The build dependency "huge" would not be required by the source package
if it is built in the "embedded" or "stage1" profile. This mechanism
neatly allows for removed build-deps, replaced build-deps and added
build-deps, and an arbitrary number of possible 'profiles'.
Besides bootstrapping, these build profiles could also be used for
embedded builds, and to allow for changed buil-deps when cross-building.
One could also imagine that DEB_BUILD_OPTIONS=nodocs could be replaced
by a profile called "nodocs". Patches for dpkg (bug#661538) and dose3
implementing this syntax already exist.
This scheme supersedes an earlier version, (referred-to as 'staged'
builds), which used repeated Build-depends-StageN: lines. See the dpkg
bug#661538 for the evolution of this.
The profile labels are arbitrary but agreement on label usage is
necessary. For bootstrap automation we have been using 'stage1',
'stage2', etc which fits with existing custom in packages which already
have such internal mechanisms using DEB_STAGE (currently gcc, eglibc,
libselinux, gcj, gnat, gdc, linux ) These seem like sensible names so
we propose to stick with them. Other useful profiles can be defined in
The drawback of this syntax is that Build-Dep parsing tools need to be
updated to read/accept it, so uploads of source containing these
annotations cannot be done until the dpkg in buildds at least parses it.
2. Build-Profiles (extension 1)
When a source package is built with fewer build dependencies (cross,
embedded, stage1, nodocs...), then it often happens that it does not
build one or more of its binary packages at all (e.g. foo-gtk, foo-java,
foo-doc). While this is a minor nuisance during a half automated
bootstrap, a fully automated bootstrapping process needs to know which
binary packages a source package does not build if it is compiled in one
of its profiles. We therefore propose a new field for binary packages in
their control file which indicates for which profiles it builds.
Builds-With-Profile: !stage1 !embedded
Different profile names are separated by spaces similar to the
Architecture field. A binary package with the above field would not be
built during the profile builds "stage1" or "embedded". Binary packages
which do not have this field would default to being built by every
profile. This field would mean a minor change to dpkg-gencontrol.
3. Build-profiles (extension 2)
A build profile is set either using a DEB_ environment variable or a
command-line option. DEB_STAGE has been used historically in a few
packages with staged build support, but that is specific to the
staged-builds purpose. For the more generic build-profiles
DEB_BUILD_PROFILE=<label> is proposed instead - (only 7 existing
packages would need to be changed - patches exist for some already).
Setting the build-profile causes dpkg-checkbuilddeps to use the modified
deps, dpkg-gencontrol to mark the built package with a new field:
Built-With-Profile: stage1 cross
This new field is optional and just meant to mark binary packages such
that they can not accidentally make it to the archive. Another idea is
to encode this information in the package version by adding a ~stage1.
Using the field is more powerful as source packages can also be built
with multiple profiles activated at once and the field can store a list
of profile names. In above example, the binary package was built with
the cross profile activated for cross compilation and the stage1 profile
activated to break a build dependency cycle.
While this field is meant to make sure not to allow any profile built
binary package to be uploaded to the archive, it can also be abused to
only allow "some" build profiles to be uploaded. For example ubuntu
might generally forbid profile built binary packages to be uploaded
except for packages built with the "ubuntu" profile only. Or emdebian
might allow binary packages using the "embedded" profile. This would
allow unified binary packages which are able to build for different
targets. As only one unified binary package can satisfy the needs of
different purposes this can improve the quality of the package as only
one codebase has to be maintained. We already have this (using
dpkg-vendor) where changes only affect the rules file, but as soon as
build-dependency changes are needed that mechanism is insufficient. This
usage of build profiles is not part of this proposal but one of the
possibilities they offer besides allowing automated bootstrapping.
4. Unified field for extensions 1 & 2
The Architecture field contains different information depending on its
context . The syntax of profiles behaves similar to those of
architecture specifiers. An alternative name for the field names of the
last two items would therefore be a unified "Profile" field whose
meaning depends on its context:
Profile: !stage1 !embedded
Profile: stage1 cross
The first one would appear in binary packages in debian/control and
indicate which binary packages do or do not build with a specific
profile. The second one would appear in DEBIAN/control (the built binary
package) and indicate with which profiles the binary package was built.
This is our favoured option as Build-with-profile/Built-with-profile
will only be confused anyway, and if it works for 'Architecture' there
seems no reason why it's not sensible for 'Profile'.
5. Cross-Builds field
For even further automation and also for quality assurance, we propose
another new field for source packages which indicates whether or not
this source package is supposed to be cross compilable.
If Debian wants to incorporate the ability to being bootstrappable in its
policy, then there *must* be some packages which are cross compiled for
a minimal build system. Adding this header to those source packages
would make it a bug if they do not actually cross compile. Without this
header, cross compilation would be wishlist as usual.
Furthermore, cross compilation is one of the methods a porter can use to
break build dependency cycles. If some packages carry this new field
then not only could a porter decide quicker whether or not a source
package is cross compilable, an algorithm could also incorporate this
information to automatically break build dependency cycles for cross
Some naming ideas:
If more automated bootstrapping of Debian is desired, then at least build
profiles (1.) should be introduced. For a fully automated bootstrapping of
Debian, the second item (extension 1) is needed as well. The third item
(extension 2) prevents accidental upload of binary packages that have
not been built fully. The last item (5.) adds further convenience to the
process but is not strictly needed.
We will now make an argument how Debian will benefit from allowing a fully
automated bootstrapping process:
- obvious: it's the simplest possible way to bootstrap Debian for new architectures
- no need for other distributions in the bootstrapping process (make Debian
- better update lagging architectures
- build packages for architectures that cannot build themselves
- allow easy sub-arch builds, optimized for a specific CPU
- continuously check the archive for bootstrappability as a QA measure
This mechanism also covers cross-compiler bootstraping. The eglibc, gcc,
and kernel packages already have the neceassary staged-build info, but
the build profiles (1.) part is also needed to specifiy the reduced
build-deps. The cross-toolchain bootstrap ceases to be a special case if
treated this way and just becomes packages to be built in stages using
the profiles mechansim, like many others in the base system (but for
build arch taregtting host, arch, rather than built for host-arch). See
the wiki article at .
The question remains of how many source packages would have to have the
proposed new fields added to them to make a full bootstrap from zero
possible. If the Gentoo USE flags were not too far off and assuming or
tools do the correct thing so far, then:
- the number of source packages that has to be modified with the new
fields is at maximum 83 (there might be a smaller set).
Another argument why a fully automated bootstrap of Debian might be
necessary is the growing problem size over the years . If this trend
continues it will only become harder to implement the necessary meta
data in the future. If enough meta data is introduced now to make a
fully automated bootstrap possible, then any subsequent work will only
have to be incremental to that.
The main questions to this list are:
- should Debian be bootstrappable in a fully automated fashion? We
created the algorithms that can allow this to happen, we just need
more meta data and a way to encode it
- do the proposals for the needed fields sound convincing? Can they be
improved? Do they have fundamental flaws?
cheers, josch and Wookey
Thanks to Thorsten Glaser and Patrick McDermott for feedback, and
numerous others along the way for help developing this scheme.