How to make port/bootstrapping work easier
- To: firstname.lastname@example.org
- Subject: How to make port/bootstrapping work easier
- From: Johannes Schauer <email@example.com>
- Date: Thu, 15 Nov 2012 05:26:03 +0100
- Message-id: <20121115042603.GA31238@hoothoot>
- In-reply-to: <CADf0C45igv-SnQcEz8v_A0vH=VKRJ+eFc7cwt2j5U=8rcYfEvw@mail.gmail.com>
- References: <CADf0C45igv-SnQcEz8v_A0vH=VKRJ+eFc7cwt2j5U=8rcYfEvw@mail.gmail.com>
thanks for your detailed report!
You also commented a lot on your actual practice (thanks!) so I changed
the subject to reflect the slight topic change of my reply.
On Wed, Nov 14, 2012 at 05:54:06PM -0800, Daniel Schepler wrote:
> I read your recent post to debian-devel with great interest, as I've
> done some bootstrapping efforts in the past, and I'm currently in the
> middle of a "port" for the x32 ABI. In the past, what I've done
> (mostly privately) was to develop a script I called "pbuildd" which
> essentially just runs through the list of currently unbuilt packages
> and tries running pbuilder on them all, then installs anything that
> succeeds into a local repository and starts up the loop again.
This is what a small function does for me in the very early steps in a
theoretical manner. Naturally it quickly fails due to dependency cycles
> Then, when things got stuck, I just did a manual inspection of the
> unsatisfied dependencies to find the cycles, and chose one to break.
> In fact, I've just started uploading my current iteration of this to
> http://126.96.36.199/debian/ -- you might want to especially look at
> scripts/pbuildd which is the central script to run this loop. (And
> over time, it's gathered various optimizations to speed up the
> "installation into local repository" step, try to avoid invoking
> pbuilder if it can easily determine that certain Build-Depends aren't
> present at all, etc.)
What my tools try to do, is to figure out a build order for
bootstrapping Debian from nothing. This order can then be given to a
tool that does the actual compilation in that order. The "figuring out
the order" part is purely theoretical. I only look at the Packages and
Sources files and the dependency relationships stored within to generate
a dependency graph which I then evaluate.
My tool doesnt know or care about whether or not a package can actually
be compiled on the new architecture. It does no compilation by itself
and can therefor not figure this out by itself. Running into compilation
problems is (as of now) still what the user would have to take care of
(from the point of view of my tools).
I call it "my tools" because there is no name for the project yet. The
git repository  and mailing list  just run under the name
"debian-bootstrap". At this point I should also mention that everything
heavily depends on dose3 and Pietro Abate is a great help with this
project and I certainly wouldnt be where I am without dose3 and his
continuous help and additions to the project.
> Initially, when I needed to break a cycle, I would just build
> something by hand and stick it into the "partial" directory, but over
> time I started developing automated cycle-breaker scripts, which are
> currently under scripts/cb.inactive (the pbuildd script looks for them
> under scripts/cb).
I had a look at the files in scripts/cb.inactive and they seem to store
lots of information about which build dependencies can be dropped for a
huge number of source packages. This is, if I read lines like
inst_pkgs "`get_control_re $PBUILDD_ROOT/build/a/antlr/*.dsc 'build-(depends|depends-indep)' |
sed -e '/\<gcj-native-helper\>/d' -e '/\<nant\>/d' \
-e '/\<cli-common-dev\>/d' -e '/\<mono-devel\>/d' \
correct in meaning that gcj-native-helper, nant, cli-common-dev etc can
be dropped, right?
Sadly, that information is stored in a turing complete format (bash
scripts) which makes the information badly machine readable. But if the
format '/\<package\>/d' is mostly used, then I guess a regex can extract
lots of the information with some tolerable uncertainty. I will try to
hack up a script that harvests the droppable build dependencies from the
files you have in scripts/cb.inactive. This information might be
immensely usable, thanks!
As a porter has to come up with those droppable build dependencies for
each new port, a new syntax has been proposed in  by Guillem Jover
called "build profiles". Would this information be included in the build
dependencies of some core packages, bootstrapping would already become
much easier for a porter. An example of how the proposed format works:
Build-Depends: huge (>= 1.0) [i386 arm] <!embedded !bootstrap>, tiny
The < and > "brackets" are used in the same way [ and ] are used for
architectures to denote the profile for which that dependency can be
dropped (or is exclusively required). Besides bootstrapping, such
profiles could also be used for embedded builds or for bootstrapping
compilers that need themselves to be built. The latter topic also
recently came up on debian-devel .
There exist trivial patches for dpkg  and dose3  to implement this
> The scripts tend to become outdated over time, though, with a moving
> target, and I'm sure the current state is no exception. My personal
> heuristics for what I preferred were: first, prefer cycle-breaking
> which just removes Build-Depends which are there to build
> documentation. Then, prefer cycle-breaking which ignores
> Build-Depends on one or a few libraries which provide purely optional
> features. If I couldn't find anything of this sort, I'd just try to
> find the cycle-breaking point which would be (fuzzily) "least
> invasive" and "least likely to break the resulting packages, at least
> as far as packages that Build-Depend on them".
In current Debian Sid, the dependency graph contains a ~930 nodes
central strongly connected component (SCC) of dependency cycles. Braking
this into a directed acyclic graph (DAG) is not trivial because edges
(dependencies) are missing weights. Weights would be information how
hard or how undesirable it is to drop a build dependency. In a post at
 I argue that the problem of braking this SCC and turning it into a
DAG through reduced build dependencies even becomes harder over the
years, as the SCC grows in size.
What a part of my algorithm does, is to implement heuristics that allow
to present those edges to the user which would make "sense" to break
from a theoretical point of view. An example would be a build dependency
only needed by a single source package but itself drawing in dozens of
more dependencies, which themselves draw in more. Dropping this build
dependency would immediately allow to greatly reduce the size of the
SCC. Naturally the importance of a build dependency can only be judged
by a human as only he can figure out how essential package X is to
compile source package Y or how hard it would be to change source
package Y to build without X.
An exception to this are, what I call "weak build dependencies". Those
are the "documentation building" packages you mention above. Since they
can mostly be dropped from any package that has them as a dependency
without doing harm, they are the first thing my algorithms remove as
well. A current list can be found here . Can it be extended?
At this point let me also mention, that the Build-Depends-Indep field is
immensely helpful when looking at the bootstrapping problem from a
theoretical point (as I do), because arch:all packages do of course not
have to be rebuild. In fact, lots of the "weak" documentation building
dependencies can just be moved to Build-Depends-Indep and by that
removing the need for me to have this list.
P. J. McDermott ("Bootstrappable Debian" GSoC project) managed to find
many source packages with Build-Depends entries that could be moved to
Build-Depends-Indep, making the bootstrapping process easier. In  I
supply a list of core packages that build arch:all packages but have no
Build-Depends-Indep field field but a binary-indep or build-indep target
in their debian/rules and combinations thereof.
> In the past, pbuildd was mainly geared towards trying to build all of
> Debian (including the binary-indep packages) starting from a minimal
> chroot and with minimal extra package downloads, but on an established
> architecture. It was only recently that I started applying it to
> bootstrapping x32. The way I started that was actually: I started off
> mainly following the instructions from Linux From Scratch, though of
> course adjusting it to "cross-building" to x32 as necessary. I also
> inserted dpkg into the process as soon as possible after the first LFS
> stage creating the chroot with /tools, and from then on ran installs
> into temporary directories, and built dummy dpkg packages with no
> dependencies. Then, after the LFS builds were over, I started
> building real Debian packages from the actual .dsc source packages,
> and eventually had enough packages built in this way that I was able
> to do a debootstrap, and start the pbuildd process.
So you used LFS to build the initial chroot. That's new for me as from
what I've heard so far, people were using Gentoo or openembedded in the
past to avoid having to cross compile a core of Debian.
With multiarch, crosscompiling becomes much nicer in Debian. Sadly,
lacking multiarch'ing of packages still prevents that a Debian base
system can be multiarch cross-compiled. Wookey is currently attempting
this with Ubuntu (as they have more multiarch). From a dependency
theoretic point of view, very little (few dozen packages of the core
packages) have their cross build dependencies satisfied.