ITM: performance and refactor patches for Britney

To: Debian Release Team <debian-release@lists.debian.org>
Cc: Raphaël Hertzog <hertzog@debian.org>
Subject: ITM: performance and refactor patches for Britney
From: Niels Thykier <niels@thykier.net>
Date: Sun, 27 Jul 2014 21:45:07 +0200
Message-id: <53D556C3.4080100@thykier.net>

Hi,

I intend to merge the following 11 patches into the Britney master
branch no later than Monday the 4th of August (provided no objects) and
I kindly ask you to review them before then.


Please pay extra attention to the following patches list of patches:

 * 0006-installability-Exploit-equvialency-to-reduce-choices.patch
 * 0010-Exploit-equivalency-to-skip-unneeded-computation.patch


A short note on each of the patches (in order).

 * 0001-inst-builder.py-Move-a-comment-and-write-doc-for-met.patch
   - Comment/Doc changes only
 * 0002-britney.py-Handle-version-ranged-dependencies-a-bit-.patch
   - This patch makes Britney "version range"-dependencies into a
     single clause rather than having them as two clauses.
   - Despite how it might look, the original handling was still correct.
 * 0003-britney.py-Split-iter_packages-into-two.patch
   - This patch makes Britney much more efficient at processing hints by
     avoiding to re-compute the installability of the same packages
     multiple times.
   - Same as the one mentioned in the "Kali"-thread
 * 0004-doop_source-Remove-redundant-part-of-the-return-valu.patch
   0005-doop_source-Remove-unncessary-table-loop-up.patch
   - Both are just a minor code clean up

 ! 0006-installability-Exploit-equvialency-to-reduce-choices.patch
   - Introduces the terms "equivalent packages", which are packages
     that have a "similar" signature in the package graph.
   - This kind of packages the property that you can swap one for the
     other without consequences in (co-)installability.
   - This patch mitigating some cases of O(2^n) runtime by reducing the
     problem size (i.e. the size of "n").
   - The patch introduces a memory overhead of about 50% (~400-500MB)
     in the live-data set.  Thus with it, Britney now uses ~1.3GB for
     live-2011-12-13
     - If needed be, I suspect I can reduce that to a "peak" usage and
       reduce the sustained memory usage.
   - NB: This patch *has been changed* since I mentioned in the "Kali"
     thread.

 * 0007-inst-tester-Exploit-eqv.-table-in-compute_testing_in.patch
   - Minor upstart optimisation by using "equivalent packages".
 * 0008-inst-tester-Attempt-to-avoid-needing-to-backtrack.patch
   - Avoid some cases of backtracking completely
   - Same as the one mentioned in the "Kali"-thread
 * 0009-britney.py-Refactor-doop_source.patch
   - Refactor doop_source that renames variables and avoids repeated
     table look ups

 ! 0010-Exploit-equivalency-to-skip-unneeded-computation.patch
   - Use "equivalent packages" to reduce the number of packages
     that Britney needs to check during migration.

 * 0011-britney.py-Use-defaultdict-instead-of-.setdefault.patch
   - Minor refactoring/optimisation



Testing:
========

There are no changes to expected output of any of the tests.  But I was
inspired to a new test case (basic-dep-range).  Prior to my revision of
patch 0006 (and introduction of 0002), the patch 0010 could trigger a
failure of this test and of live-2011-12-13 (allowing "alex" to migrate
despite it breaking haskell-platform).

I have already tested some of them on the Kali and I am currently in the
process of doing a re-run will all these patches.  Though honestly, I
doubt it will have a major difference compared to my previous run
(reported in the "Kali" thread).

I suspect further performance improvements should come from reducing the
overhead of testing migration items or/and a better scheduling order to
avoid re-trying items that will just fail.  After that, the original
auto hinter could use some love as well.


~Niels

>From 4009d1b24407c8b79a32a79023e5a11df1c3b22d Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Sat, 19 Jul 2014 09:51:36 +0200
Subject: [PATCH 01/11] inst/builder.py: Move a comment and write doc for
 method

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 installability/builder.py | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/installability/builder.py b/installability/builder.py
index 10b78eb..c3ba7c8 100644
--- a/installability/builder.py
+++ b/installability/builder.py
@@ -198,10 +198,12 @@ class InstallabilityTesterBuilder(object):
         return rel
 
     def build(self):
-        # Merge reverse conflicts with conflicts - this saves some
-        # operations in _check_loop since we only have to check one
-        # set (instead of two) and we remove a few duplicates here
-        # and there.
+        """Compile the installability tester
+
+        This method will compile an installability tester from the
+        information given and (where possible) try to optimise a
+        few things.
+        """
         package_table = self._package_table
         reverse_package_table = self._reverse_package_table
         intern_set = self._intern_set
@@ -220,6 +222,11 @@ class InstallabilityTesterBuilder(object):
                     return False
             return True
 
+
+        # Merge reverse conflicts with conflicts - this saves some
+        # operations in _check_loop since we only have to check one
+        # set (instead of two) and we remove a few duplicates here
+        # and there.
         for pkg in reverse_package_table:
             if pkg not in package_table:
                 raise RuntimeError("%s/%s/%s referenced but not added!" % pkg)
-- 
2.0.1

>From 7161a3eff24d2a073911c3d132df623ba499c927 Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Sun, 27 Jul 2014 16:56:37 +0200
Subject: [PATCH 02/11] britney.py: Handle version-ranged dependencies a bit
 smarter

Avoid creating two dependency clauses for dependencies emulating a
"version range" a la:

  Depends: pkg-a (>= 2), pkg-a (<< 3~)

Previously this would create two clauses a la:

 - (pkg-a, 2, arch), (pkg-a, 3, arch)
 - (pkg-a, 1, arch), (pkg-a, 2, arch)

However, it is plain to see that only (pkg-a, 2, arch) is a valid
solution and the other options are just noise.  This patch makes
Britney merge these two claues into a single clause containing exactly
(pkg-a, 1, arch).

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/britney.py b/britney.py
index d0fcdd0..038578e 100755
--- a/britney.py
+++ b/britney.py
@@ -431,10 +431,12 @@ class Britney(object):
 
                 depends = []
                 conflicts = []
+                possible_dep_ranges = {}
 
                 # We do not differ between depends and pre-depends
                 if pkgdata[DEPENDS]:
                     depends.extend(apt_pkg.parse_depends(pkgdata[DEPENDS], False))
+
                 if pkgdata[CONFLICTS]:
                     conflicts = apt_pkg.parse_depends(pkgdata[CONFLICTS], False)
 
@@ -442,8 +444,10 @@ class Britney(object):
 
                     for (al, dep) in [(depends, True), \
                                       (conflicts, False)]:
+
                         for block in al:
                             sat = set()
+
                             for dep_dist in binaries:
                                 (_, pkgs) = solvers(block, arch, dep_dist)
                                 for p in pkgs:
@@ -460,7 +464,37 @@ class Britney(object):
                                         # is using Â§7.6.2
                                         relations.add_breaks(pt)
                             if dep:
-                                relations.add_dependency_clause(sat)
+                                if len(block) != 1:
+                                    relations.add_dependency_clause(sat)
+                                else:
+                                    # This dependency might be a part
+                                    # of a version-range a la:
+                                    #
+                                    #   Depends: pkg-a (>= 1),
+                                    #            pkg-a (<< 2~)
+                                    #
+                                    # In such a case we want to reduce
+                                    # that to a single clause for
+                                    # efficiency.
+                                    #
+                                    # In theory, it could also happen
+                                    # with "non-minimal" dependencies
+                                    # a la:
+                                    #
+                                    #   Depends: pkg-a, pkg-a (>= 1)
+                                    #
+                                    # But dpkg is known to fix that up
+                                    # at build time, so we will
+                                    # probably only see "ranges" here.
+                                    key = block[0][0]
+                                    if key in possible_dep_ranges:
+                                        possible_dep_ranges[key] &= sat
+                                    else:
+                                        possible_dep_ranges[key] = sat
+
+                        if dep:
+                            for clause in possible_dep_ranges.itervalues():
+                                relations.add_dependency_clause(clause)
 
         self._inst_tester = builder.build()
 
-- 
2.0.1

>From 14f5d4ad40e5576b8127bd5138d1a14967400ba6 Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Thu, 24 Jul 2014 22:38:44 +0200
Subject: [PATCH 03/11] britney.py: Split iter_packages into two

Extract a specialised iter_packages_hint from iter_packages that only
deals with applying hints.  This simplifies iter_packages AND avoids
having to re-compute the uninstallability counters after each single
item in the hint.

This means that a hint can now avoid triggering expontential runtime
provided only that the "post-hint" stage does not trigger expontential
runtime.  Previously the hint had to be ordered such that none of the
items in the hint caused such behaviour (if at all possible).

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py | 99 +++++++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 59 insertions(+), 40 deletions(-)

diff --git a/britney.py b/britney.py
index 038578e..2836093 100755
--- a/britney.py
+++ b/britney.py
@@ -2103,7 +2103,51 @@ class Britney(object):
                 self._installability_test(p, version, arch, broken, to_check, nuninst_arch)
 
 
-    def iter_packages(self, packages, selected, hint=False, nuninst=None, lundo=None):
+    def iter_packages_hint(self, hinted_packages, lundo=None):
+        """Iter on hinted list of actions and apply them in one go
+
+        This method applies the changes from "hinted_packages" to
+        testing and computes the uninstallability counters after te
+        actions are performed.
+
+        The method returns the new uninstallability counters.
+        """
+
+        removals = set()
+        all_affected = set()
+        nobreakall_arches = self.options.nobreakall_arches.split()
+        binaries_t = self.binaries['testing']
+        check_packages = partial(self._check_packages, binaries_t)
+        # Deep copy nuninst (in case the hint is undone)
+        nuninst = {k:v.copy() for k,v in self.nuninst_orig.iteritems()}
+
+
+        for item in hinted_packages:
+            _, rms, _ = self._compute_groups(item.package, item.suite,
+                                             item.architecture,
+                                             item.is_removal,
+                                             allow_smooth_updates=False)
+            removals.update(rms)
+
+        for item in hinted_packages:
+            _, affected, undo = self.doop_source(item,
+                                                 removals=removals)
+            all_affected.update(affected)
+            if lundo is not None:
+                lundo.append((undo,item))
+
+        for arch in self.options.architectures:
+            if arch not in nobreakall_arches:
+                skip_archall = True
+            else:
+                skip_archall = False
+
+            check_packages(arch, all_affected, skip_archall, nuninst)
+
+        return nuninst
+
+
+    def iter_packages(self, packages, selected, nuninst=None, lundo=None):
         """Iter on the list of actions and apply them one-by-one
 
         This method applies the changes from `packages` to testing, checking the uninstallability
@@ -2132,25 +2176,10 @@ class Britney(object):
         dependencies = self.dependencies
         check_packages = partial(self._check_packages, binaries)
 
-        # pre-process a hint batch
-        pre_process = {}
-        if selected and hint:
-            removals = set()
-            for item in selected:
-                _, rms, _ = self._compute_groups(item.package, item.suite,
-                                                 item.architecture,
-                                                 item.is_removal,
-                                                 allow_smooth_updates=False)
-                removals.update(rms)
-            for package in selected:
-                pkg, affected, undo = self.doop_source(package,
-                                                       removals=removals)
-                pre_process[package] = (pkg, affected, undo)
-
         if lundo is None:
             lundo = []
-        if not hint:
-            self.output_write("recur: [%s] %s %d/%d\n" % ("", ",".join(x.uvname for x in selected), len(packages), len(extra)))
+
+        self.output_write("recur: [%s] %s %d/%d\n" % ("", ",".join(x.uvname for x in selected), len(packages), len(extra)))
 
         # loop on the packages (or better, actions)
         while packages:
@@ -2174,45 +2203,32 @@ class Britney(object):
                         break
                 if defer: continue
 
-            if not hint:
-                self.output_write("trying: %s\n" % (pkg.uvname))
+            self.output_write("trying: %s\n" % (pkg.uvname))
 
             better = True
             nuninst = {}
 
             # apply the changes
-            if pkg in pre_process:
-                item, affected, undo = pre_process[pkg]
-            else:
-                item, affected, undo = self.doop_source(pkg, lundo)
-            if hint:
-                lundo.append((undo, item))
+            item, affected, undo = self.doop_source(pkg, lundo)
 
             # check the affected packages on all the architectures
             for arch in (item.architecture == 'source' and architectures or (item.architecture,)):
                 if arch not in nobreakall_arches:
                     skip_archall = True
-                else: skip_archall = False
+                else:
+                    skip_archall = False
 
                 nuninst[arch] = set(x for x in nuninst_comp[arch] if x in binaries[arch][0])
                 nuninst[arch + "+all"] = set(x for x in nuninst_comp[arch + "+all"] if x in binaries[arch][0])
 
                 check_packages(arch, affected, skip_archall, nuninst)
 
-                # if we are processing hints, go ahead
-                if hint:
-                    nuninst_comp[arch] = nuninst[arch]
-                    nuninst_comp[arch + "+all"] = nuninst[arch + "+all"]
-                    continue
-
                 # if the uninstallability counter is worse than before, break the loop
                 if ((item.architecture != 'source' and arch not in new_arches) or \
                     (arch not in break_arches)) and len(nuninst[arch]) > len(nuninst_comp[arch]):
                     better = False
                     break
 
-            # if we are processing hints or the package is already accepted, go ahead
-            if hint or item in selected: continue
 
             # check if the action improved the uninstallability counters
             if better:
@@ -2242,9 +2258,6 @@ class Britney(object):
                 # (local-scope) binaries is actually self.binaries["testing"] so we cannot use it here.
                 undo_changes(single_undo, self._inst_tester, sources, self.binaries)
 
-        # if we are processing hints, return now
-        if hint:
-            return (nuninst_comp, [])
 
         self.output_write(" finish: [%s]\n" % ",".join( x.uvname for x in selected ))
         self.output_write("endloop: %s\n" % (self.eval_nuninst(self.nuninst_orig)))
@@ -2255,6 +2268,7 @@ class Britney(object):
 
         return (nuninst_comp, extra)
 
+
     def do_all(self, hinttype=None, init=None, actions=None):
         """Testing update runner
 
@@ -2274,6 +2288,7 @@ class Britney(object):
         recurse = True
         lundo = None
         nuninst_end = None
+        extra = () # empty tuple
 
         if hinttype == "easy" or hinttype == "force-hint":
             force = hinttype == "force-hint"
@@ -2298,7 +2313,11 @@ class Britney(object):
 
         if init:
             # init => a hint (e.g. "easy") - so do the hint run
-            (nuninst_end, extra) = self.iter_packages(init, selected, hint=True, lundo=lundo)
+            nuninst_end = self.iter_packages_hint(selected, lundo=lundo)
+            if recurse:
+                # Ensure upgrade_me and selected do not overlap, if we
+                # follow-up with a recurse ("hint"-hint).
+                upgrade_me = [x for x in upgrade_me if x not in set(selcted)]
 
         if recurse:
             # Either the main run or the recursive run of a "hint"-hint.
@@ -2338,7 +2357,7 @@ class Britney(object):
                 if recurse:
                     self.upgrade_me = sorted(extra)
                 else:
-                    self.upgrade_me = [x for x in self.upgrade_me if x not in selected]
+                    self.upgrade_me = [x for x in self.upgrade_me if x not in set(selected)]
                 self.sort_actions()
         else:
             self.output_write("FAILED\n")
-- 
2.0.1

>From 93132067dfc4f6673d0559db2bab1d4329a9ca42 Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Thu, 24 Jul 2014 23:00:56 +0200
Subject: [PATCH 04/11] doop_source: Remove redundant part of the return value

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/britney.py b/britney.py
index 2836093..07d5ce4 100755
--- a/britney.py
+++ b/britney.py
@@ -1938,9 +1938,9 @@ class Britney(object):
         This method applies the changes required by the action `item` tracking
         them so it will be possible to revert them.
 
-        The method returns a list of the package name, the suite where the
-        package comes from, the set of packages affected by the change and
-        the dictionary undo which can be used to rollback the changes.
+        The method returns a tuple containing a set of packages
+        affected by the change (as (name, arch)-tuples) and the
+        dictionary undo which can be used to rollback the changes.
         """
         undo = {'binaries': {}, 'sources': {}, 'virtual': {}, 'nvirtual': []}
 
@@ -2068,7 +2068,7 @@ class Britney(object):
                 sources['testing'][item.package] = sources[item.suite][item.package]
 
         # return the package name, the suite, the list of affected packages and the undo dictionary
-        return (item, affected, undo)
+        return (affected, undo)
 
 
     def _check_packages(self, binaries, arch, affected, skip_archall, nuninst):
@@ -2130,8 +2130,8 @@ class Britney(object):
             removals.update(rms)
 
         for item in hinted_packages:
-            _, affected, undo = self.doop_source(item,
-                                                 removals=removals)
+            affected, undo = self.doop_source(item,
+                                              removals=removals)
             all_affected.update(affected)
             if lundo is not None:
                 lundo.append((undo,item))
@@ -2183,7 +2183,7 @@ class Britney(object):
 
         # loop on the packages (or better, actions)
         while packages:
-            pkg = packages.pop(0)
+            item = packages.pop(0)
 
             # this is the marker for the first loop
             if not mark_passed and position < 0:
@@ -2195,21 +2195,21 @@ class Britney(object):
             # defer packages if their dependency has been already skipped
             if not mark_passed:
                 defer = False
-                for p in dependencies.get(pkg, []):
+                for p in dependencies.get(item, []):
                     if p in skipped:
-                        deferred.append(make_migrationitem(pkg, self.sources))
-                        skipped.append(make_migrationitem(pkg, self.sources))
+                        deferred.append(make_migrationitem(item, self.sources))
+                        skipped.append(make_migrationitem(item, self.sources))
                         defer = True
                         break
                 if defer: continue
 
-            self.output_write("trying: %s\n" % (pkg.uvname))
+            self.output_write("trying: %s\n" % (item.uvname))
 
             better = True
             nuninst = {}
 
             # apply the changes
-            item, affected, undo = self.doop_source(pkg, lundo)
+            affected, undo = self.doop_source(item, lundo)
 
             # check the affected packages on all the architectures
             for arch in (item.architecture == 'source' and architectures or (item.architecture,)):
@@ -2233,10 +2233,10 @@ class Britney(object):
             # check if the action improved the uninstallability counters
             if better:
                 lundo.append((undo, item))
-                selected.append(pkg)
+                selected.append(item)
                 packages.extend(extra)
                 extra = []
-                self.output_write("accepted: %s\n" % (pkg.uvname))
+                self.output_write("accepted: %s\n" % (item.uvname))
                 self.output_write("   ori: %s\n" % (self.eval_nuninst(self.nuninst_orig)))
                 self.output_write("   pre: %s\n" % (self.eval_nuninst(nuninst_comp)))
                 self.output_write("   now: %s\n" % (self.eval_nuninst(nuninst, nuninst_comp)))
@@ -2247,8 +2247,8 @@ class Britney(object):
                 for k in nuninst:
                     nuninst_comp[k] = nuninst[k]
             else:
-                self.output_write("skipped: %s (%d <- %d)\n" % (pkg.uvname, len(extra), len(packages)))
-                self.output_write("    got: %s\n" % (self.eval_nuninst(nuninst, pkg.architecture != 'source' and nuninst_comp or None)))
+                self.output_write("skipped: %s (%d <- %d)\n" % (item.uvname, len(extra), len(packages)))
+                self.output_write("    got: %s\n" % (self.eval_nuninst(nuninst, item.architecture != 'source' and nuninst_comp or None)))
                 self.output_write("    * %s: %s\n" % (arch, ", ".join(sorted(b for b in nuninst[arch] if b not in nuninst_comp[arch]))))
 
                 extra.append(item)
-- 
2.0.1

>From 87b5b38c45b8c219ed95c087ce3965f5c3075af4 Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Thu, 24 Jul 2014 23:05:32 +0200
Subject: [PATCH 05/11] doop_source: Remove unncessary table loop up

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/britney.py b/britney.py
index 07d5ce4..182711d 100755
--- a/britney.py
+++ b/britney.py
@@ -1963,7 +1963,7 @@ class Britney(object):
 
                 # remove all the binaries which aren't being smooth updated
                 for bin_data in bins:
-                    binary, _, parch = bin_data
+                    binary, version, parch = bin_data
                     p = binary + "/" + parch
                     # save the old binary for undo
                     undo['binaries'][p] = binaries[parch][0][binary]
@@ -1978,7 +1978,6 @@ class Britney(object):
                         if len(binaries[parch][1][j]) == 0:
                             del binaries[parch][1][j]
                     # finally, remove the binary package
-                    version = binaries[parch][0][binary][VERSION]
                     del binaries[parch][0][binary]
                     self._inst_tester.remove_testing_binary(binary, version, parch)
                 # remove the source package
-- 
2.0.1

>From 922d3fc01cbee8417ec7bad5bb566ad7e1709819 Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Sat, 19 Jul 2014 20:05:23 +0200
Subject: [PATCH 06/11] installability: Exploit equvialency to reduce choices

For some cases, like aspell-dictionary, a number of packages can
satisfy the dependency (e.g. all aspell-*).  In the particular
example, most (all?) of the aspell-* look so similar to the extend
that reverse dependencies cannot tell two aspell-* packages apart (IRT
to installability and co-installability).

This patch attempts to help the installability tester by detecting
such cases and reducing the number of candidates for a given choice.

Reported-In: <20140716134823.GA11795@x230-buxy.home.ouaza.com>
Signed-off-by: Niels Thykier <niels@thykier.net>
---
 installability/builder.py | 119 ++++++++++++++++++++++++++++++++++++++++------
 installability/solver.py  |   4 +-
 installability/tester.py  |  48 ++++++++++++++-----
 3 files changed, 143 insertions(+), 28 deletions(-)

diff --git a/installability/builder.py b/installability/builder.py
index c3ba7c8..75adf63 100644
--- a/installability/builder.py
+++ b/installability/builder.py
@@ -12,6 +12,7 @@
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 # GNU General Public License for more details.
 
+from collections import defaultdict
 from contextlib import contextmanager
 
 from britney_util import ifilter_except, iter_except
@@ -28,7 +29,7 @@ class _RelationBuilder(object):
         self._new_breaks = set(binary_data[1])
 
 
-    def add_dependency_clause(self, or_clause):
+    def add_dependency_clause(self, or_clause, frozenset=frozenset):
         """Add a dependency clause
 
         The clause must be a sequence of (name, version, architecture)
@@ -48,12 +49,12 @@ class _RelationBuilder(object):
         binary = self._binary
         itbuilder = self._itbuilder
         package_table = itbuilder._package_table
-        reverse_package_table = itbuilder._reverse_package_table
         okay = False
         for dep_tuple in clause:
             okay = True
-            reverse_relations = itbuilder._reverse_relations(dep_tuple)
-            reverse_relations[0].add(binary)
+            rdeps, _, rdep_relations = itbuilder._reverse_relations(dep_tuple)
+            rdeps.add(binary)
+            rdep_relations.add(clause)
 
         self._new_deps.add(clause)
         if not okay:
@@ -193,7 +194,7 @@ class InstallabilityTesterBuilder(object):
 
         if binary in self._reverse_package_table:
             return self._reverse_package_table[binary]
-        rel = [set(), set()]
+        rel = [set(), set(), set()]
         self._reverse_package_table[binary] = rel
         return rel
 
@@ -227,18 +228,21 @@ class InstallabilityTesterBuilder(object):
         # operations in _check_loop since we only have to check one
         # set (instead of two) and we remove a few duplicates here
         # and there.
+        #
+        # At the same time, intern the rdep sets
         for pkg in reverse_package_table:
             if pkg not in package_table:
                 raise RuntimeError("%s/%s/%s referenced but not added!" % pkg)
-            if not reverse_package_table[pkg][1]:
-                # no rconflicts - ignore
-                continue
             deps, con = package_table[pkg]
-            if not con:
-                con = intern_set(reverse_package_table[pkg][1])
-            else:
-                con = intern_set(con | reverse_package_table[pkg][1])
-            package_table[pkg] = (deps, con)
+            rdeps, rcon, rdep_relations = reverse_package_table[pkg]
+            if rcon:
+                if not con:
+                    con = intern_set(rcon)
+                else:
+                    con = intern_set(con | rcon)
+                package_table[pkg] = (deps, con)
+            reverse_package_table[pkg] = (intern_set(rdeps), con,
+                                          intern_set(rdep_relations))
 
         # Check if we can expand broken.
         for t in not_broken(iter_except(check.pop, KeyError)):
@@ -308,8 +312,95 @@ class InstallabilityTesterBuilder(object):
                     # add all rdeps (except those already in the safe_set)
                     check.update(reverse_package_table[pkg][0] - safe_set)
 
+        eqv_table = self._build_eqv_packages_table(package_table,
+                                       reverse_package_table)
 
         return InstallabilitySolver(package_table,
                                     reverse_package_table,
                                     self._testing, self._broken,
-                                    self._essentials, safe_set)
+                                    self._essentials, safe_set,
+                                    eqv_table)
+
+
+    def _build_eqv_packages_table(self, package_table,
+                                  reverse_package_table,
+                                  frozenset=frozenset):
+        """Attempt to build a table of equivalent packages
+
+        This method attempts to create a table of packages that are
+        equivalent (in terms of installability).  If two packages (A
+        and B) are equivalent then testing the installability of A is
+        the same as testing the installability of B.  This equivalency
+        also applies to co-installability.
+
+        The example cases:
+        * aspell-*
+        * ispell-*
+
+        Cases that do *not* apply:
+        * MTA's
+
+        The theory:
+
+        The packages A and B are equivalent iff:
+
+          reverse_depends(A) == reverse_depends(B) AND
+                conflicts(A) == conflicts(B)       AND
+                  depends(A) == depends(B)
+
+        Where "reverse_depends(X)" is the set of reverse dependencies
+        of X, "conflicts(X)" is the set of negative dependencies of X
+        (Breaks and Conflicts plus the reverse ones of those combined)
+        and "depends(X)" is the set of strong dependencies of X
+        (Depends and Pre-Depends combined).
+
+        To be honest, we are actually equally interested another
+        property as well, namely substitutability.  The package A can
+        always used instead of B, iff:
+
+          reverse_depends(A) >= reverse_depends(B) AND
+                conflicts(A) <= conflicts(B)       AND
+                  depends(A) == depends(B)
+
+        (With the same definitions as above).  Note that equivalency
+        is just a special-case of substitutability, where A and B can
+        substitute each other (i.e. a two-way substituation).
+
+        Finally, note that the "depends(A) == depends(B)" for
+        substitutability is actually not a strict requirement.  There
+        are cases where those sets are different without affecting the
+        property.
+        """
+        # Despite talking about substitutability, the method currently
+        # only finds the equivalence cases.  Lets leave
+        # substitutability for a future version.
+
+        find_eqv_table = defaultdict(list)
+        eqv_table = {}
+
+        for pkg in reverse_package_table:
+            rdeps = reverse_package_table[pkg][2]
+            if not rdeps:
+                # we don't care for things without rdeps (because
+                # it is not worth it)
+                continue
+            deps, con = package_table[pkg]
+            ekey = (deps, con, rdeps)
+            find_eqv_table[ekey].append(pkg)
+
+        for pkg_list in find_eqv_table.itervalues():
+            if len(pkg_list) < 2:
+                continue
+            if (len(pkg_list) == 2 and pkg_list[0][0] == pkg_list[1][0]
+               and pkg_list[0][2] == pkg_list[1][2]):
+                # This is a (most likely) common and boring case.  It
+                # is when pkgA depends on pkgB and is satisfied with
+                # any version available.  However, at most one version
+                # of pkgB will be available in testing, so other
+                # filters will make this case redundant.
+                continue
+            eqv_set = frozenset(pkg_list)
+            for pkg in pkg_list:
+                eqv_table[pkg] = eqv_set
+
+        return eqv_table
diff --git a/installability/solver.py b/installability/solver.py
index cc5acee..318d5f9 100644
--- a/installability/solver.py
+++ b/installability/solver.py
@@ -24,7 +24,7 @@ from britney_util import (ifilter_only, iter_except)
 class InstallabilitySolver(InstallabilityTester):
 
     def __init__(self, universe, revuniverse, testing, broken, essentials,
-                 safe_set):
+                 safe_set, eqv_table):
         """Create a new installability solver
 
         universe is a dict mapping package tuples to their
@@ -44,7 +44,7 @@ class InstallabilitySolver(InstallabilityTester):
             (simplifies caches and dependency checking)
         """
         InstallabilityTester.__init__(self, universe, revuniverse, testing,
-                                      broken, essentials, safe_set)
+                                      broken, essentials, safe_set, eqv_table)
 
 
     def solve_groups(self, groups):
diff --git a/installability/tester.py b/installability/tester.py
index 2b98b1b..d409c4a 100644
--- a/installability/tester.py
+++ b/installability/tester.py
@@ -20,7 +20,7 @@ from britney_util import iter_except
 class InstallabilityTester(object):
 
     def __init__(self, universe, revuniverse, testing, broken, essentials,
-                 safe_set):
+                 safe_set, eqv_table):
         """Create a new installability tester
 
         universe is a dict mapping package tuples to their
@@ -51,6 +51,7 @@ class InstallabilityTester(object):
         self._essentials = essentials
         self._revuniverse = revuniverse
         self._safe_set = safe_set
+        self._eqv_table = eqv_table
 
         # Cache of packages known to be broken - we deliberately do not
         # include "broken" in it.  See _optimize for more info.
@@ -235,8 +236,9 @@ class InstallabilityTester(object):
             never.update(ess_never)
 
         # curry check_loop
-        check_loop = partial(self._check_loop, universe, testing, musts,
-                             never, choices, cbroken)
+        check_loop = partial(self._check_loop, universe, testing,
+                             self._eqv_table, musts, never, choices,
+                             cbroken)
 
 
         # Useful things to remember:
@@ -359,8 +361,9 @@ class InstallabilityTester(object):
         return verdict
 
 
-    def _check_loop(self, universe, testing, musts, never,
-                    choices, cbroken, check):
+    def _check_loop(self, universe, testing, eqv_table, musts, never,
+                    choices, cbroken, check, len=len,
+                    frozenset=frozenset):
         """Finds all guaranteed dependencies via "check".
 
         If it returns False, t is not installable.  If it returns True
@@ -368,8 +371,6 @@ class InstallabilityTester(object):
         returns True, then t is installable.
         """
         # Local variables for faster access...
-        l = len
-        fset = frozenset
         not_satisfied = partial(ifilter, musts.isdisjoint)
 
         # While we have guaranteed dependencies (in check), examine all
@@ -401,9 +402,9 @@ class InstallabilityTester(object):
                 #  - not in testing
                 #  - known to be broken (by cache)
                 #  - in never
-                candidates = fset((depgroup & testing) - never)
+                candidates = frozenset((depgroup & testing) - never)
 
-                if l(candidates) == 0:
+                if len(candidates) == 0:
                     # We got no candidates to satisfy it - this
                     # package cannot be installed with the current
                     # testing
@@ -413,21 +414,43 @@ class InstallabilityTester(object):
                         cbroken.add(cur)
                         testing.remove(cur)
                     return False
-                if l(candidates) == 1:
+                if len(candidates) == 1:
                     # only one possible solution to this choice and we
                     # haven't seen it before
                     check.update(candidates)
                     musts.update(candidates)
                 else:
+                    possible_eqv = set(x for x in candidates if x in eqv_table)
+                    if len(possible_eqv) > 1:
+                        # Exploit equvialency to reduce the number of
+                        # candidates if possible.  Basically, this
+                        # code maps "similar" candidates into a single
+                        # candidate that will give a identical result
+                        # to any other candidate it eliminates.
+                        #
+                        # See InstallabilityTesterBuilder's
+                        # _build_eqv_packages_table method for more
+                        # information on how this works.
+                        new_cand = set(x for x in candidates if x not in possible_eqv)
+                        for chosen in iter_except(possible_eqv.pop, KeyError):
+                            new_cand.add(chosen)
+                            possible_eqv -= eqv_table[chosen]
+                        if len(new_cand) == 1:
+                            check.update(new_cand)
+                            musts.update(new_cand)
+                            continue
+                        candidates = frozenset(new_cand)
                     # defer this choice till later
                     choices.add(candidates)
         return True
 
+
     def _get_min_pseudo_ess_set(self, arch):
         if arch not in self._cache_ess:
             # The minimal essential set cache is not present -
             # compute it now.
             testing = self._testing
+            eqv_table = self._eqv_table
             cbroken = self._cache_broken
             universe = self._universe
             safe_set = self._safe_set
@@ -439,8 +462,9 @@ class InstallabilityTester(object):
             not_satisified = partial(ifilter, start.isdisjoint)
 
             while ess_base:
-                self._check_loop(universe, testing, start, ess_never,\
-                                     ess_choices, cbroken, ess_base)
+                self._check_loop(universe, testing, eqv_table,
+                                 start, ess_never, ess_choices,
+                                 cbroken, ess_base)
                 if ess_choices:
                     # Try to break choices where possible
                     nchoice = set()
-- 
2.0.1

>From 762fca389740ab42d149c41219893097b1e68c57 Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Tue, 22 Jul 2014 20:39:48 +0200
Subject: [PATCH 07/11] inst-tester: Exploit eqv. table in
 compute_testing_installability

Use the equvilence table to skip some calls to _check_inst.

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 installability/tester.py | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/installability/tester.py b/installability/tester.py
index d409c4a..e64ee4b 100644
--- a/installability/tester.py
+++ b/installability/tester.py
@@ -81,11 +81,22 @@ class InstallabilityTester(object):
         check_inst = self._check_inst
         cbroken = self._cache_broken
         cache_inst = self._cache_inst
-        tcopy = [x for x in self._testing]
+        eqv_table = self._eqv_table
+        testing = self._testing
+        tcopy = [x for x in testing]
         for t in ifilterfalse(cache_inst.__contains__, tcopy):
             if t in cbroken:
                 continue
-            check_inst(t)
+            res = check_inst(t)
+            if t in eqv_table:
+                eqv = (x for x in eqv_table[t] if x in testing)
+                if res:
+                    cache_inst.update(eqv)
+                else:
+                    eqv_set = frozenset(eqv)
+                    testing -= eqv_set
+                    cbroken |= eqv_set
+
 
     def add_testing_binary(self, pkg_name, pkg_version, pkg_arch):
         """Add a binary package to "testing"
@@ -422,7 +433,7 @@ class InstallabilityTester(object):
                 else:
                     possible_eqv = set(x for x in candidates if x in eqv_table)
                     if len(possible_eqv) > 1:
-                        # Exploit equvialency to reduce the number of
+                        # Exploit equivalency to reduce the number of
                         # candidates if possible.  Basically, this
                         # code maps "similar" candidates into a single
                         # candidate that will give a identical result
-- 
2.0.1

>From f9570187e0daffe9888868dcbc50aa77f3352bff Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Thu, 24 Jul 2014 19:54:33 +0200
Subject: [PATCH 08/11] inst-tester: Attempt to avoid needing to backtrack

When trying to break a choice, try the candidate out and see if we can
pick it without any consequences.  Basically, if the candidate causes
no new conflicts or choices, we can safely pick it.

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 installability/tester.py | 60 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 49 insertions(+), 11 deletions(-)

diff --git a/installability/tester.py b/installability/tester.py
index e64ee4b..ba58267 100644
--- a/installability/tester.py
+++ b/installability/tester.py
@@ -207,6 +207,7 @@ class InstallabilityTester(object):
         testing = self._testing
         cbroken = self._cache_broken
         safe_set = self._safe_set
+        eqv_table = self._eqv_table
 
         # Our installability verdict - start with "yes" and change if
         # prove otherwise.
@@ -248,7 +249,7 @@ class InstallabilityTester(object):
 
         # curry check_loop
         check_loop = partial(self._check_loop, universe, testing,
-                             self._eqv_table, musts, never, choices,
+                             eqv_table, musts, never, choices,
                              cbroken)
 
 
@@ -271,7 +272,7 @@ class InstallabilityTester(object):
         #     of t via recursion (calls _check_inst).  In this case
         #     check and choices are not (always) empty.
 
-        def _pick_choice(rebuild):
+        def _pick_choice(rebuild, set=set, len=len):
             """Picks a choice from choices and updates rebuild.
 
             Prunes the choices and updates "rebuild" to reflect the
@@ -330,18 +331,55 @@ class InstallabilityTester(object):
             last = next(choice) # pick one to go last
             for p in choice:
                 musts_copy = musts.copy()
-                never_copy = never.copy()
-                choices_copy = choices.copy()
-                if self._check_inst(p, musts_copy, never_copy, choices_copy):
+                never_tmp = set()
+                choices_tmp = set()
+                check_tmp = set([p])
+                if not self._check_loop(universe, testing, eqv_table,
+                                        musts_copy, never_tmp,
+                                        choices_tmp, cbroken,
+                                        check_tmp):
+                    # p cannot be chosen/is broken (unlikely, but ...)
+                    continue
+
+                # Test if we can pick p without any consequences.
+                # - when we can, we avoid a backtrack point.
+                if never_tmp <= never and choices_tmp <= rebuild:
+                    # we can pick p without picking up new conflicts
+                    # or unresolved choices.  Therefore we commit to
+                    # using p.
+                    #
+                    # NB: Optimally, we would go to the start of this
+                    # routine, but to conserve stack-space, we return
+                    # and expect to be called again later.
+                    musts.update(musts_copy)
+                    return False
+
+                if not musts.isdisjoint(never_tmp):
+                    # If we pick p, we will definitely end up making
+                    # t uninstallable, so p is a no-go.
+                    continue
+
+                # We are not sure that p is safe, setup a backtrack
+                # point and recurse.
+                never_tmp |= never
+                choices_tmp |= rebuild
+                if self._check_inst(p, musts_copy, never_tmp,
+                                    choices_tmp):
+                    # Success, p was a valid choice and made it all
+                    # installable
                     return True
-                # If we get here, we failed to find something that would satisfy choice (without breaking
-                # the installability of t).  This means p cannot be used to satisfy the dependencies, so
-                # pretend to conflict with it - hopefully it will reduce future choices.
+
+                # If we get here, we failed to find something that
+                # would satisfy choice (without breaking the
+                # installability of t).  This means p cannot be used
+                # to satisfy the dependencies, so pretend to conflict
+                # with it - hopefully it will reduce future choices.
                 never.add(p)
 
-            # Optimization for the last case; avoid the recursive call and just
-            # assume the last will lead to a solution.  If it doesn't there is
-            # no solution and if it does, we don't have to back-track anyway.
+            # Optimization for the last case; avoid the recursive call
+            # and just assume the last will lead to a solution.  If it
+            # doesn't there is no solution and if it does, we don't
+            # have to back-track anyway.
             check.add(last)
             musts.add(last)
             return False
-- 
2.0.1

>From 8e9e26245141e47ae229c886c4c48a805428764a Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Thu, 24 Jul 2014 23:52:50 +0200
Subject: [PATCH 09/11] britney.py: Refactor doop_source

Rename local variables and avoid repeated chained lookups.  In
particular, avoid confusing cases like:

   [...]
         version = binaries[parch][0][binary][VERSION]

   [...]
   binaries[parch][0][binary] = self.binaries[item.suite][parch][0][binary]
   version = binaries[parch][0][binary][VERSION]

Where "version" here will refer to two different versions.  The former
the version from testing of a hijacked binary and the latter the
version from the source suite (despite the look up using the "testing"
table, due to the testing copy being updated).

Notable renamings:
 * binaries => packages_t (a.k.a. self.binaries['testing'])
 * binaries[parch][0] => binaries_t_a
 * binaries[parch][1] => provides_t_a
 * Similar naming used for "item.suite" instead of "testing"

The naming is based on the following logic:
 * self.binaries from "packages" files
   (by this logic, it ought to be "self.packages", but thats for
   later)
 * The "_X_a" is short for "[<suite>][<parch>]" look ups.
 * binaries_X_a and provides_X_a are the specialised parts of
   packages_X_a that deal with (real) binary packages and
   provides (i.e. virtual packages) respectively.

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py | 70 ++++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 39 insertions(+), 31 deletions(-)

diff --git a/britney.py b/britney.py
index 182711d..17f955f 100755
--- a/britney.py
+++ b/britney.py
@@ -1948,8 +1948,8 @@ class Britney(object):
 
         # local copies for better performances
         sources = self.sources
-        binaries = self.binaries['testing']
-        get_reverse_tree = partial(compute_reverse_tree, self.binaries["testing"])
+        packages_t = self.binaries['testing']
+        get_reverse_tree = partial(compute_reverse_tree, packages_t)
         # remove all binary packages (if the source already exists)
         if item.architecture == 'source' or not item.is_removal:
             if item.package in sources['testing']:
@@ -1962,23 +1962,26 @@ class Britney(object):
                                                   removals=removals)
 
                 # remove all the binaries which aren't being smooth updated
-                for bin_data in bins:
-                    binary, version, parch = bin_data
+                for rm_tuple in bins:
+                    binary, version, parch = rm_tuple
                     p = binary + "/" + parch
+                    binaries_t_a, provides_t_a = packages_t[parch]
+
+                    pkg_data = binaries_t_a[binary]
                     # save the old binary for undo
-                    undo['binaries'][p] = binaries[parch][0][binary]
+                    undo['binaries'][p] = pkg_data
                     # all the reverse dependencies are affected by the change
                     affected.update(get_reverse_tree(binary, parch))
                     # remove the provided virtual packages
-                    for j in binaries[parch][0][binary][PROVIDES]:
+                    for j in pkg_data[PROVIDES]:
                         key = j + "/" + parch
                         if key not in undo['virtual']:
-                            undo['virtual'][key] = binaries[parch][1][j][:]
-                        binaries[parch][1][j].remove(binary)
-                        if len(binaries[parch][1][j]) == 0:
-                            del binaries[parch][1][j]
+                            undo['virtual'][key] = provides_t_a[j][:]
+                        provides_t_a[j].remove(binary)
+                        if not provides_t_a[j]:
+                            del provides_t_a[j]
                     # finally, remove the binary package
-                    del binaries[parch][0][binary]
+                    del binaries_t_a[binary]
                     self._inst_tester.remove_testing_binary(binary, version, parch)
                 # remove the source package
                 if item.architecture == 'source':
@@ -1990,37 +1993,41 @@ class Britney(object):
 
         # single binary removal; used for clearing up after smooth
         # updates but not supported as a manual hint
-        elif item.package in binaries[item.architecture][0]:
-            undo['binaries'][item.package + "/" + item.architecture] = binaries[item.architecture][0][item.package]
+        elif item.package in packages_t[item.architecture][0]:
+            binaries_t_a = packages_t[item.architecture][0]
+            undo['binaries'][item.package + "/" + item.architecture] = binaries_t_a[item.package]
             affected.update(get_reverse_tree(item.package, item.architecture))
-            version = binaries[item.architecture][0][item.package][VERSION]
-            del binaries[item.architecture][0][item.package]
+            version = binaries_t_a[item.package][VERSION]
+            del binaries_t_a[item.package]
             self._inst_tester.remove_testing_binary(item.package, version, item.architecture)
 
 
         # add the new binary packages (if we are not removing)
         if not item.is_removal:
             source = sources[item.suite][item.package]
+            packages_s = self.binaries[item.suite]
             for p in source[BINARIES]:
                 binary, parch = p.split("/")
                 if item.architecture not in ['source', parch]: continue
                 key = (binary, parch)
+                binaries_t_a, provides_t_a = packages_t[parch]
                 # obviously, added/modified packages are affected
                 if key not in affected: affected.add(key)
                 # if the binary already exists in testing, it is currently
                 # built by another source package. we therefore remove the
                 # version built by the other source package, after marking
                 # all of its reverse dependencies as affected
-                if binary in binaries[parch][0]:
+                if binary in binaries_t_a:
+                    old_pkg_data = binaries_t_a[binary]
                     # save the old binary package
-                    undo['binaries'][p] = binaries[parch][0][binary]
+                    undo['binaries'][p] = old_pkg_data
                     # all the reverse dependencies are affected by the change
                     affected.update(get_reverse_tree(binary, parch))
                     # all the reverse conflicts and their dependency tree are affected by the change
-                    for j in binaries[parch][0][binary][RCONFLICTS]:
+                    for j in old_pkg_data[RCONFLICTS]:
                         affected.update(get_reverse_tree(j, parch))
-                    version = binaries[parch][0][binary][VERSION]
-                    self._inst_tester.remove_testing_binary(binary, version, parch)
+                    old_version = old_pkg_data[VERSION]
+                    self._inst_tester.remove_testing_binary(binary, old_version, parch)
                 else:
                     # the binary isn't in testing, but it may have been at
                     # the start of the current hint and have been removed
@@ -2036,21 +2043,22 @@ class Britney(object):
                     for (tundo, tpkg) in hint_undo:
                         if p in tundo['binaries']:
                             for rdep in tundo['binaries'][p][RDEPENDS]:
-                                if rdep in binaries[parch][0] and rdep not in source[BINARIES]:
+                                if rdep in binaries_t_a and rdep not in source[BINARIES]:
                                     affected.update(get_reverse_tree(rdep, parch))
-                # add/update the binary package
-                binaries[parch][0][binary] = self.binaries[item.suite][parch][0][binary]
-                version = binaries[parch][0][binary][VERSION]
-                self._inst_tester.add_testing_binary(binary, version, parch)
+                # add/update the binary package from the source suite
+                new_pkg_data = packages_s[parch][0][binary]
+                new_version = new_pkg_data[VERSION]
+                binaries_t_a[binary] = new_pkg_data
+                self._inst_tester.add_testing_binary(binary, new_version, parch)
                 # register new provided packages
-                for j in binaries[parch][0][binary][PROVIDES]:
+                for j in new_pkg_data[PROVIDES]:
                     key = j + "/" + parch
-                    if j not in binaries[parch][1]:
+                    if j not in provides_t_a:
                         undo['nvirtual'].append(key)
-                        binaries[parch][1][j] = []
+                        provides_t_a[j] = []
                     elif key not in undo['virtual']:
-                        undo['virtual'][key] = binaries[parch][1][j][:]
-                    binaries[parch][1][j].append(binary)
+                        undo['virtual'][key] = provides_t_a[j][:]
+                    provides_t_a[j].append(binary)
                 # all the reverse dependencies are affected by the change
                 affected.update(get_reverse_tree(binary, parch))
 
@@ -2060,7 +2068,7 @@ class Britney(object):
             else:
                 ext = "/" + item.architecture
                 pkg_iter = (p.split("/")[0] for p in source[BINARIES] if p.endswith(ext))
-            register_reverses(binaries[parch][0], binaries[parch][1], iterator=pkg_iter)
+            register_reverses(binaries_t_a, provides_t_a, iterator=pkg_iter)
 
             # add/update the source package
             if item.architecture == 'source':
-- 
2.0.1

>From 02d1d556f2128f86075c74582dc32bd0601e848c Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Fri, 25 Jul 2014 22:02:17 +0200
Subject: [PATCH 10/11] Exploit equivalency to skip unneeded computation

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py                | 70 +++++++++++++++++++++++++++++++++--------------
 installability/builder.py |  9 +-----
 installability/tester.py  | 11 ++++++++
 3 files changed, 62 insertions(+), 28 deletions(-)

diff --git a/britney.py b/britney.py
index 17f955f..7b0bed8 100755
--- a/britney.py
+++ b/britney.py
@@ -1950,28 +1950,50 @@ class Britney(object):
         sources = self.sources
         packages_t = self.binaries['testing']
         get_reverse_tree = partial(compute_reverse_tree, packages_t)
+        inst_tester = self._inst_tester
+        eqv_set = set()
+
         # remove all binary packages (if the source already exists)
         if item.architecture == 'source' or not item.is_removal:
             if item.package in sources['testing']:
                 source = sources['testing'][item.package]
 
-                _, bins, _ = self._compute_groups(item.package,
-                                                  item.suite,
-                                                  item.architecture,
-                                                  item.is_removal,
-                                                  removals=removals)
+                updates, rms, _ = self._compute_groups(item.package,
+                                                       item.suite,
+                                                       item.architecture,
+                                                       item.is_removal,
+                                                       removals=removals)
+
+                eqv_table = {}
+
+                for binary, version, parch in rms:
+                    key = (binary, parch)
+                    eqv_table[key] = version
+
+                for p1 in updates:
+                    binary, _, parch = p1
+                    key = (binary, parch)
+                    old_version = eqv_table.get(key)
+                    if old_version is not None:
+                        p2 = (binary, old_version, parch)
+                        if inst_tester.are_equivalent(p1, p2):
+                            eqv_set.add(key)
 
                 # remove all the binaries which aren't being smooth updated
-                for rm_tuple in bins:
+                for rm_tuple in rms:
                     binary, version, parch = rm_tuple
                     p = binary + "/" + parch
                     binaries_t_a, provides_t_a = packages_t[parch]
+                    pkey = (binary, parch)
 
                     pkg_data = binaries_t_a[binary]
                     # save the old binary for undo
                     undo['binaries'][p] = pkg_data
-                    # all the reverse dependencies are affected by the change
-                    affected.update(get_reverse_tree(binary, parch))
+                    if pkey not in eqv_set:
+                        # all the reverse dependencies are affected by
+                        # the change
+                        affected.update(get_reverse_tree(binary, parch))
+
                     # remove the provided virtual packages
                     for j in pkg_data[PROVIDES]:
                         key = j + "/" + parch
@@ -1982,7 +2004,7 @@ class Britney(object):
                             del provides_t_a[j]
                     # finally, remove the binary package
                     del binaries_t_a[binary]
-                    self._inst_tester.remove_testing_binary(binary, version, parch)
+                    inst_tester.remove_testing_binary(binary, version, parch)
                 # remove the source package
                 if item.architecture == 'source':
                     undo['sources'][item.package] = source
@@ -1999,7 +2021,7 @@ class Britney(object):
             affected.update(get_reverse_tree(item.package, item.architecture))
             version = binaries_t_a[item.package][VERSION]
             del binaries_t_a[item.package]
-            self._inst_tester.remove_testing_binary(item.package, version, item.architecture)
+            inst_tester.remove_testing_binary(item.package, version, item.architecture)
 
 
         # add the new binary packages (if we are not removing)
@@ -2011,8 +2033,11 @@ class Britney(object):
                 if item.architecture not in ['source', parch]: continue
                 key = (binary, parch)
                 binaries_t_a, provides_t_a = packages_t[parch]
+                equivalent_replacement = key in eqv_set
+
                 # obviously, added/modified packages are affected
-                if key not in affected: affected.add(key)
+                if not equivalent_replacement and key not in affected:
+                    affected.add(key)
                 # if the binary already exists in testing, it is currently
                 # built by another source package. we therefore remove the
                 # version built by the other source package, after marking
@@ -2021,13 +2046,16 @@ class Britney(object):
                     old_pkg_data = binaries_t_a[binary]
                     # save the old binary package
                     undo['binaries'][p] = old_pkg_data
-                    # all the reverse dependencies are affected by the change
-                    affected.update(get_reverse_tree(binary, parch))
-                    # all the reverse conflicts and their dependency tree are affected by the change
-                    for j in old_pkg_data[RCONFLICTS]:
-                        affected.update(get_reverse_tree(j, parch))
+                    if not equivalent_replacement:
+                        # all the reverse dependencies are affected by
+                        # the change
+                        affected.update(get_reverse_tree(binary, parch))
+                        # all the reverse conflicts and their
+                        # dependency tree are affected by the change
+                        for j in old_pkg_data[RCONFLICTS]:
+                            affected.update(get_reverse_tree(j, parch))
                     old_version = old_pkg_data[VERSION]
-                    self._inst_tester.remove_testing_binary(binary, old_version, parch)
+                    inst_tester.remove_testing_binary(binary, old_version, parch)
                 else:
                     # the binary isn't in testing, but it may have been at
                     # the start of the current hint and have been removed
@@ -2045,11 +2073,12 @@ class Britney(object):
                             for rdep in tundo['binaries'][p][RDEPENDS]:
                                 if rdep in binaries_t_a and rdep not in source[BINARIES]:
                                     affected.update(get_reverse_tree(rdep, parch))
+
                 # add/update the binary package from the source suite
                 new_pkg_data = packages_s[parch][0][binary]
                 new_version = new_pkg_data[VERSION]
                 binaries_t_a[binary] = new_pkg_data
-                self._inst_tester.add_testing_binary(binary, new_version, parch)
+                inst_tester.add_testing_binary(binary, new_version, parch)
                 # register new provided packages
                 for j in new_pkg_data[PROVIDES]:
                     key = j + "/" + parch
@@ -2059,8 +2088,9 @@ class Britney(object):
                     elif key not in undo['virtual']:
                         undo['virtual'][key] = provides_t_a[j][:]
                     provides_t_a[j].append(binary)
-                # all the reverse dependencies are affected by the change
-                affected.update(get_reverse_tree(binary, parch))
+                if not equivalent_replacement:
+                    # all the reverse dependencies are affected by the change
+                    affected.update(get_reverse_tree(binary, parch))
 
             # register reverse dependencies and conflicts for the new binary packages
             if item.architecture == 'source':
diff --git a/installability/builder.py b/installability/builder.py
index 75adf63..23ff716 100644
--- a/installability/builder.py
+++ b/installability/builder.py
@@ -391,14 +391,7 @@ class InstallabilityTesterBuilder(object):
         for pkg_list in find_eqv_table.itervalues():
             if len(pkg_list) < 2:
                 continue
-            if (len(pkg_list) == 2 and pkg_list[0][0] == pkg_list[1][0]
-               and pkg_list[0][2] == pkg_list[1][2]):
-                # This is a (most likely) common and boring case.  It
-                # is when pkgA depends on pkgB and is satisfied with
-                # any version available.  However, at most one version
-                # of pkgB will be available in testing, so other
-                # filters will make this case redundant.
-                continue
+
             eqv_set = frozenset(pkg_list)
             for pkg in pkg_list:
                 eqv_table[pkg] = eqv_set
diff --git a/installability/tester.py b/installability/tester.py
index ba58267..4be7dba 100644
--- a/installability/tester.py
+++ b/installability/tester.py
@@ -98,6 +98,17 @@ class InstallabilityTester(object):
                     cbroken |= eqv_set
 
 
+    def are_equivalent(self, p1, p2):
+        """Test if p1 and p2 are equivalent
+
+        Returns True if p1 and p2 have the same "signature" in
+        the package dependency graph (i.e. relations can not tell
+        them appart sematically except for their name)
+        """
+        eqv_table = self._eqv_table
+        return p1 in eqv_table and p2 in eqv_table[p1]
+
+
     def add_testing_binary(self, pkg_name, pkg_version, pkg_arch):
         """Add a binary package to "testing"
 
-- 
2.0.1

>From db4624a6dd3254bce733d82feb470a2a8e1beeff Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Fri, 25 Jul 2014 08:44:13 +0200
Subject: [PATCH 11/11] britney.py: Use defaultdict instead of "{}.setdefault"

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/britney.py b/britney.py
index 7b0bed8..d699c7a 100755
--- a/britney.py
+++ b/britney.py
@@ -189,6 +189,7 @@ import urllib
 
 import apt_pkg
 
+from collections import defaultdict
 from functools import reduce, partial
 from itertools import chain, ifilter, product
 from operator import attrgetter
@@ -678,7 +679,7 @@ class Britney(object):
         The method returns a dictionary where the key is the binary package
         name and the value is the list of open RC bugs for it.
         """
-        bugs = {}
+        bugs = defaultdict(list)
         filename = os.path.join(basedir, "BugsV")
         self.__log("Loading RC bugs data from %s" % filename)
         for line in open(filename):
@@ -687,7 +688,6 @@ class Britney(object):
                 self.__log("Malformed line found in line %s" % (line), type='W')
                 continue
             pkg = l[0]
-            bugs.setdefault(pkg, [])
             bugs[pkg] += l[1].split(",")
         return bugs
 
@@ -2783,10 +2783,10 @@ class Britney(object):
 
     def nuninst_arch_report(self, nuninst, arch):
         """Print a report of uninstallable packages for one architecture."""
-        all = {}
+        all = defaultdict(set)
         for p in nuninst[arch]:
             pkg = self.binaries['testing'][arch][0][p]
-            all.setdefault((pkg[SOURCE], pkg[SOURCEVER]), set()).add(p)
+            all[pkg].add(p)
 
         print '* %s' % (arch,)
 
-- 
2.0.1

Reply to:

Follow-Ups:
- Re: ITM: performance and refactor patches for Britney
  - From: James McCoy <jamessan@debian.org>

Prev by Date: Bug#746388: transition: packagekit 0.9
Next by Date: Re: P-a-s for stable and stable-security
Previous by thread: Processed: tagging 754275, tagging 754446, tagging 755018
Next by thread: Re: ITM: performance and refactor patches for Britney
Index(es):
- Date
- Thread