[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

A new metric for source package importance in ports



Hi,

the following is a report of a successful implementation of what I have been
talking about with Niels Thykier during debconf13. The question was how
important it is for a source package to be compilable or exist in the first
place given an incomplete port which is in the process of being bootstrapped.
This work is solving a different purpose than the identification of "key
packages" by Lucas Nussbaum [1]. Instead of attaching a binary value to each
source package, this method is associating integer values to them. Once
bootstrapping of the whole archive becomes more important or even possible in
real life through an implementation of build profiles, this heuristic could be
used to further extend the meaning of "key packages" as well.

This heuristic attaches to each source package A the number of source packages
which need A to be compilable so that they become compilable themselves. The
dependency graph which is needed to extract this information is conveniently
created by the service I run as http://bootstrap.debian.net - I'm using a
simple Python script to walk this graph to extract the information.

In fact that Python script uses two different graphs. Since dependencies
contain disjunctions, there exists different choices for packages which have to
be available for something to be compilable or installable. To not make this
choice arbitrary, I calculate the minimum number of dependencies that have to
be available (strong dependencies) and the maximum number that has to be
available (dependency closure). Therefore each source package A is associated
with two numbers: the minimum amount of source packages which depend on A being
compilable and the maximum number of source packages which depend on A being
compilable.

To create more than syntactic meaning I also added popcon information to the
output. I associate to each source package A the sum of all popcon values of
the source packages which depend on A being compilable. Again this is done for
the minimum as well as the maximum.

So here is the (tab delimetered) data in no particular order:

http://mister-muffin.de/p/pVxb.txt

1st column: the name of the source package
2nd column: minimum number of source packages which need this source pacage to be compilable
3rd column: maximum number of source packages which need this source pacage to be compilable
4th column: minimum sum of popcon values
5th column: maximum sum of popcon values

Do you see any obvious error?

When sorting the data by the second column, you will see that there are 1194
source packages with the same value: 19554. This value corresponds to the total
amount of source packages. It means: everything else depends on these 1194
source packages being compilable. If those 1194 source package are not
compilable then the rest will be neither. Remember that this only true during a
bootstrappping scenario. These 1194 source package are also all part of the
same strongly connected component of the strong srcgraph and roughly correlate
to the smallest set of packages which are needed for a self-hosting Debian
system.  We call a set of binary and source packages self-hosting if all binary
packages can be created from the source packages and all source packages can be
compiled with just the available binary packages. In my opinion it would make
sense to make all packages which are at minimum required to make Debian
self-hosted to the set of "key packages" by extending the definition by Lucas
Nussbaum at [1].

The amount of source packages which are needed to bootstrap themselves and all
the rest of Debian is that high because it includes source packages which are
only included because of the arch:all binary packages they build, because of
the essential:yes packages they build or because of the build-essential
packages they build. While it is important to include these for rebuilds of the
whole archive, they are not important in a real bootstrap situation. Arch:all
binary packages already exist and do not need to be bootstrapped and to start
to compile packages natively, a minimal build system (essential:yes +
build-essential) is required in the first place. Therefore I created a
different graph which takes into account that arch:all packages as well as the
packages of the minimal build system do not need to be rebuild:

http://mister-muffin.de/p/Gid8.txt

One can see that now the amount of source packages which is needed to build the
rest of the archive is only 383. It is important that these source packages
remain compilable (in addition to essential:yes + build-essential being
cross-able) because otherwise a bootstrap of any new architecture cannot be
done. The service at http://bootstrap.debian.net will indicate that an
architecture is not bootstrappable at all if this is the case.

Does anybody see enough value in these numbers for source package importance in
the light of bootstrapping Debian (either for a new port or for rebuilding the
archive from scratch)? If so, then I can generate these numbers for all source
packages on a daily basis and publish them with the rest of the data on
http://bootstrap.debian.net

cheers, josch

[1] https://lists.debian.org/debian-devel/2013/05/msg00496.html


Reply to: