[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

SAT based britney

Hi Release-Team,

in this mail, I’d like to introduce an idea for an improved testing
migration system that has been floating in my mind since over a year. I
originally wanted to propose and implement it for last DebConf, but did
not get sponsored. This year I will attend DebCamp and -Conf, so it is
about time for a RFC.

It is going to be a slighter long mail and if you are not interested in the
testing migration software aka britney, you might want to skip the mail.
If you do read it, though, I am especially interested in the answers to
the following questions:

 * Do you get it? (i.e., did I explain it well enough or are there
questions left)
 * Do you think it can handle all current and anticipatable
requirements? (I’m not involved in release team work and might missed
some conditions.)
 * What do you think of the advantages and improvements that I point
 * What other benefits that I do not see yet would come from this
 * Is it worth the hassle?

And before going into the details, here the high-level goals that I want
to achieve with SAT-britney (stupid working title):

 * Delegate as much logic to a general purpose tool (a MAX-SAT-solver),
for less specific code to maintain and to benefit from improvements in
that area.
 * Completeness: If some set of packages have to migrate together,
SAT-britney will migrate that together without any manual hinting
 * Correctness: Packages in testing are always installable. (See below
for a caveat with regard to Conflicts)
 * Intelligible output explaining why a package has not migrated. I want
something that is much better than update_output.txt
 * Easy, comprehensive and powerful way of removing or adding
constraints, e.g. force_hints, package removals.

The system is three-layered. On the lowest level is a general purpose
MAX-SAT-solver. It takes as input a propositional formula in conjunctive
normal form with some variables, and a list of “desired” variables. It
will then output 
 * a satisfying assignment which is maximal, i.e. there is no satisfying
assignment where a strict superset of desired variables is true.
 * for each desired variable which is not true, a minimal set of clauses
which prevent this variable from being set to true.
A MAX-SAT-solver can be implemented using a SAT-solver such as picosat
and some iterations, but there might be room for improvements. For our
purpose, this is a black box.

The next layer adds some bells and whistles around the SAT-solver, but
is still not Debian-specific. Now the variables are arbitrary strings
without whitespace, with will, in our case, have suggestive names such
as ghc_7.0.3-1_src/testing or ghc_7.0.3-1+b1_amd64/testing. The clauses
can be a bit more expressive than CNF and, very important for some of
the goals, can carry a comments (here separated by "by" or "because").
The layer would process a list of clauses such as:

ghc_7.0.3-1+b1_amd64/testing implies ghc_7.0.3-1_src/testing by policy.
ghc_7.0.3-1+b1_amd64/testing implies any of gcc_4:4.5.2-5_amd64/testing gcc_4:4.4.5-2_amd64/testing by ghc's dependency "gcc (>= 4:4.2)"
ghc_7.0.3-1+b1_amd64/testing implies not ghc6_6.12.1-1_amd64/testing by ghc's conflict "ghc6 (<< 7)"
ghc_7.0.3-1_src/testing implies all of ghc_7.0.3-1+b1_amd64/testing ghc_7.0.3-1+b1_i386/testing ghc_7.0.3-1_powerpc/testing by release architecture sync
at most one of gcc_4:4.4.5-2_amd64/testing gcc_4:4.5.2-5_amd64/testing because one version per package per distribution
not gcc_4:4.5.2-5_src/testing because gcc is too young (4 out of 10 days)
not gcc_4:4.5.2-5_src/testing because gcc is RC-buggy: #12345, #67789

And a list of variables that we would like to be true (i.e. packages
that should eventually migrate to testing):

The exact syntax is of course not fixed yet, but I think you can see at
what I am aiming at: A language that is somewhat easy to understand and
that allows us to express all the conditions we impose upon packages in
testing :
 * binary package always with the source package
 * all release architectures are in sync
 * all dependencies fulfilled
 * all build-dependencies fulfilled (currently not enforced, but easily
enabled in this system if desired)
 * if packages exists in testing in old version and is not removed in
unstable, then it should not be removed from testing
 * no too young packages should migrate
 * no RC-buggy packages should migrate
 * what else?

The syntax is also human-writable, for the override lists. Besides the
input file containing the full set of clauses representing the usual
migration policies, there would be one manually edited file of clauses
to be ignored and to be added. It could look like this:

ignore not gcc_4:4.5.2-5_src/testing because we like rc bugs in gcc in testing 
add udev_167-2_src/testing implies lvm2_2.02.84-2_src/testing because other combinations break, but the packages do not actually imply that

The third layer is a layer that looks at the actual Packages and Sources
files and generates all the clauses suggested above. It creates
variables for each package/arch/version triples, resolves virtual
packages and Provides and does all the version arithmetic in
dependencies etc. It also needs to get the age and buggyness data. The
result of the solver will be used to create the new set of packages for
testing (how does this work, by the way – does britney create a new
Packages file, does she create a .changes file processed by dak or even
another way?). Initially, it could just create a large hint that can be
fed to the existing britney, of course. The solver also gives the
reasons for non-migrations, this data needs also be published.

It remains to be seen if the rules that create the clauses can be
defined in a declarative and high-level manner such that modifcations
thereof (e.g. introduction of CUT, or ensuring build-depends) can be
done without changing actual code. But I guess such changes will be
rare, so this is not a priority goal.

Caveat: Conflicts

You might have noticed already: This system will not be able to hande
Conflicts relationships between packages in their full generality.
Conflicts, just as Depends, state something about co-installability and
_not_ co-existence in a distribution. For Depends this is not a problem:
If A depends on B to be installed, then A also depends on B to exist in
the distribution (assuming self-contained distributions). For conflicts
this is not true: Just because exim conflicts with postfix does not mean
that both should not be present in the same distribution. A complete
solution would require running something like edos-distcheck for each
possible combination of migrated packages – not feasible.

But in reality, things are simpler: There are two uses of Conflicts. One
is the exim/postfix conflict, a statement that is purely about
co-installability and of no interested to britney. These conflicts can
be safely ignored. The other one is when a package actually breaks
things in another package, and which are fixed in later versions of the
other package. udev has a long list of such conflicts in the Breaks
field. All of them have a upper version constraint. Only these are
relevant to britney.

So here is my heuristic to handle this: If a Conflicts or Breaks has an
upper version constraint, then add a corresponding clause. If not,
ignore it.

This might lead to uninstallable packages in testing in corner cases
(imagine some package depending on both exim4 and postfix), but such a
case is clearly an RC bug that ought to be filed by someone and which
would be detected by a run of edos-distcheck on unstable already. 

Caveat: Performance

Of course I have not implemented the hole thing and cannot even guess
how fast such a run would be. Ideally so quick that the release-team
could also play with it (assume all package are old enough to migrate,
what would migrate. Assume this RC bug is fixed, what would change). But
maybe so slow that it still can be used for britney, but only with one
run per migration (how often do packages migrate per day?).

The list of all clauses might not have to be actually written to a file
and parsed again if the second and third layer are implemented in the
same language and combined in the same process. Same for interpreting
the results. In that case, one would have an option to get the list of
clauses manually (for debugging, or to find out what rules to override).

So – what do you think?


PS: Please give honest feedback. I’d rather been told that this is
rubbish and not spend time on it than spending time on it and afterwards
noticing that it will not be used.

PPS: I am still traveling India, so my replies might be a bit slower than usual.

Joachim Breitner
  e-Mail: mail@joachim-breitner.de
  Homepage: http://www.joachim-breitner.de
  ICQ#: 74513189
  Jabber-ID: nomeata@joachim-breitner.de

Joachim "nomeata" Breitner
Debian Developer
  nomeata@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C
  JID: nomeata@joachim-breitner.de | http://people.debian.org/~nomeata

Attachment: signature.asc
Description: This is a digitally signed message part

Reply to: