Hi Release-Team, in this mail, I’d like to introduce an idea for an improved testing migration system that has been floating in my mind since over a year. I originally wanted to propose and implement it for last DebConf, but did not get sponsored. This year I will attend DebCamp and -Conf, so it is about time for a RFC. It is going to be a slighter long mail and if you are not interested in the testing migration software aka britney, you might want to skip the mail. If you do read it, though, I am especially interested in the answers to the following questions: * Do you get it? (i.e., did I explain it well enough or are there questions left) * Do you think it can handle all current and anticipatable requirements? (I’m not involved in release team work and might missed some conditions.) * What do you think of the advantages and improvements that I point out? * What other benefits that I do not see yet would come from this system? * Is it worth the hassle? And before going into the details, here the high-level goals that I want to achieve with SAT-britney (stupid working title): * Delegate as much logic to a general purpose tool (a MAX-SAT-solver), for less specific code to maintain and to benefit from improvements in that area. * Completeness: If some set of packages have to migrate together, SAT-britney will migrate that together without any manual hinting required. * Correctness: Packages in testing are always installable. (See below for a caveat with regard to Conflicts) * Intelligible output explaining why a package has not migrated. I want something that is much better than update_output.txt * Easy, comprehensive and powerful way of removing or adding constraints, e.g. force_hints, package removals. The system is three-layered. On the lowest level is a general purpose MAX-SAT-solver. It takes as input a propositional formula in conjunctive normal form with some variables, and a list of “desired” variables. It will then output * a satisfying assignment which is maximal, i.e. there is no satisfying assignment where a strict superset of desired variables is true. * for each desired variable which is not true, a minimal set of clauses which prevent this variable from being set to true. A MAX-SAT-solver can be implemented using a SAT-solver such as picosat and some iterations, but there might be room for improvements. For our purpose, this is a black box. The next layer adds some bells and whistles around the SAT-solver, but is still not Debian-specific. Now the variables are arbitrary strings without whitespace, with will, in our case, have suggestive names such as ghc_7.0.3-1_src/testing or ghc_7.0.3-1+b1_amd64/testing. The clauses can be a bit more expressive than CNF and, very important for some of the goals, can carry a comments (here separated by "by" or "because"). The layer would process a list of clauses such as: ghc_7.0.3-1+b1_amd64/testing implies ghc_7.0.3-1_src/testing by policy. ghc_7.0.3-1+b1_amd64/testing implies any of gcc_4:4.5.2-5_amd64/testing gcc_4:4.4.5-2_amd64/testing by ghc's dependency "gcc (>= 4:4.2)" ghc_7.0.3-1+b1_amd64/testing implies not ghc6_6.12.1-1_amd64/testing by ghc's conflict "ghc6 (<< 7)" ghc_7.0.3-1_src/testing implies all of ghc_7.0.3-1+b1_amd64/testing ghc_7.0.3-1+b1_i386/testing ghc_7.0.3-1_powerpc/testing by release architecture sync at most one of gcc_4:4.4.5-2_amd64/testing gcc_4:4.5.2-5_amd64/testing because one version per package per distribution not gcc_4:4.5.2-5_src/testing because gcc is too young (4 out of 10 days) not gcc_4:4.5.2-5_src/testing because gcc is RC-buggy: #12345, #67789 And a list of variables that we would like to be true (i.e. packages that should eventually migrate to testing): ghc_7.0.3-1_amd64/testing ghc_7.0.3-1_powerpc/testing gcc_4:4.5.2-5_amd64/testing The exact syntax is of course not fixed yet, but I think you can see at what I am aiming at: A language that is somewhat easy to understand and that allows us to express all the conditions we impose upon packages in testing : * binary package always with the source package * all release architectures are in sync * all dependencies fulfilled * all build-dependencies fulfilled (currently not enforced, but easily enabled in this system if desired) * if packages exists in testing in old version and is not removed in unstable, then it should not be removed from testing * no too young packages should migrate * no RC-buggy packages should migrate * what else? The syntax is also human-writable, for the override lists. Besides the input file containing the full set of clauses representing the usual migration policies, there would be one manually edited file of clauses to be ignored and to be added. It could look like this: ignore not gcc_4:4.5.2-5_src/testing because we like rc bugs in gcc in testing add udev_167-2_src/testing implies lvm2_2.02.84-2_src/testing because other combinations break, but the packages do not actually imply that The third layer is a layer that looks at the actual Packages and Sources files and generates all the clauses suggested above. It creates variables for each package/arch/version triples, resolves virtual packages and Provides and does all the version arithmetic in dependencies etc. It also needs to get the age and buggyness data. The result of the solver will be used to create the new set of packages for testing (how does this work, by the way – does britney create a new Packages file, does she create a .changes file processed by dak or even another way?). Initially, it could just create a large hint that can be fed to the existing britney, of course. The solver also gives the reasons for non-migrations, this data needs also be published. It remains to be seen if the rules that create the clauses can be defined in a declarative and high-level manner such that modifcations thereof (e.g. introduction of CUT, or ensuring build-depends) can be done without changing actual code. But I guess such changes will be rare, so this is not a priority goal. Caveat: Conflicts You might have noticed already: This system will not be able to hande Conflicts relationships between packages in their full generality. Conflicts, just as Depends, state something about co-installability and _not_ co-existence in a distribution. For Depends this is not a problem: If A depends on B to be installed, then A also depends on B to exist in the distribution (assuming self-contained distributions). For conflicts this is not true: Just because exim conflicts with postfix does not mean that both should not be present in the same distribution. A complete solution would require running something like edos-distcheck for each possible combination of migrated packages – not feasible. But in reality, things are simpler: There are two uses of Conflicts. One is the exim/postfix conflict, a statement that is purely about co-installability and of no interested to britney. These conflicts can be safely ignored. The other one is when a package actually breaks things in another package, and which are fixed in later versions of the other package. udev has a long list of such conflicts in the Breaks field. All of them have a upper version constraint. Only these are relevant to britney. So here is my heuristic to handle this: If a Conflicts or Breaks has an upper version constraint, then add a corresponding clause. If not, ignore it. This might lead to uninstallable packages in testing in corner cases (imagine some package depending on both exim4 and postfix), but such a case is clearly an RC bug that ought to be filed by someone and which would be detected by a run of edos-distcheck on unstable already. Caveat: Performance Of course I have not implemented the hole thing and cannot even guess how fast such a run would be. Ideally so quick that the release-team could also play with it (assume all package are old enough to migrate, what would migrate. Assume this RC bug is fixed, what would change). But maybe so slow that it still can be used for britney, but only with one run per migration (how often do packages migrate per day?). The list of all clauses might not have to be actually written to a file and parsed again if the second and third layer are implemented in the same language and combined in the same process. Same for interpreting the results. In that case, one would have an option to get the list of clauses manually (for debugging, or to find out what rules to override). So – what do you think? Greetings, Joachim PS: Please give honest feedback. I’d rather been told that this is rubbish and not spend time on it than spending time on it and afterwards noticing that it will not be used. PPS: I am still traveling India, so my replies might be a bit slower than usual. -- Joachim Breitner e-Mail: mail@joachim-breitner.de Homepage: http://www.joachim-breitner.de ICQ#: 74513189 Jabber-ID: nomeata@joachim-breitner.de -- Joachim "nomeata" Breitner Debian Developer nomeata@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C JID: nomeata@joachim-breitner.de | http://people.debian.org/~nomeata
Attachment:
signature.asc
Description: This is a digitally signed message part