ITM: Britney2 - refactoring and optimisation of the original auto-hinter

To: debian-release <debian-release@lists.debian.org>
Subject: ITM: Britney2 - refactoring and optimisation of the original auto-hinter
From: Niels Thykier <niels@thykier.net>
Date: Tue, 02 Jun 2015 22:30:48 +0200
Message-id: <[🔎] 556E1278.8080109@thykier.net>

Hi,

I have devised a number of patches that I intend to merge into master on
Wednesday the 10th of June (2015).  They include:

 * Refactor/minor code clean up.
   - Side-effects include minor output changes!
 * Optimisation of the original auto-hinter and sort_actions.

The executive technical summary/diffstat:

Niels Thykier (12):
  britney_util: Typo fix
  Remove unused local variable
  Remove "leading" from hints - it provides no new information
  Be verbose when rejecting a hint due to invalid candidates
  britney.py: Remove redundant arg to format
  britney.py: Minor optimisation to sort_actions
  britney.py: Pre-split self.options.*_arches
  britney: Optimise original auto-hinter duplication handling
  Avoid O(n^2) duplication handling when building hints
  britney.py: Avoid some O(n) look-ups in the auto-hinter
  iter_pkg: Refactor nuninst cloning
  Create a clone_nuninst function for iter_packages{,_hint}

 britney.py      | 205 ++++++++++++++++++++++++++----------------------
 britney_util.py |  17 ++++-
 2 files changed, 125 insertions(+), 97 deletions(-)


Output changes
==============
There are a few output changes to the britney log:

 * An invalid hint is now rejected with:

   "failed: %s is not a valid candidate (or it already migrated)"

   (was the slightly more terse: "failed: %s")

 * Hints no longer cause Britney to emit a "leading: <list>" line.
   It seemed redundant to have an entire copy of the hint (admittedly
   without versions on the items)

They are in separate patches and should be trivial to filter out if desired.

The optimisations
=================

I noticed that the original auto-hinter had a heavy "preparation" time
on the Kali data set[KALI-DATASET].  In fact, it spends up to an hour on
my laptop figuring what to hint.  With the patch series, this is halved
to about 30 minutes.  AFAICT, the major win is from the 0013 patch.

The optimisation of sort_actions() was primarily made on a theoretical
basis and not tested measured carefully.  Though if it primarily affects
"extreme" with many removals - the Kali data set is likely to be the
best contender.



[KALI-DATASET]:
https://lists.debian.org/debian-release/2014/07/msg00409.html

From 875f3eebeafed2297a9bfa431494a01c7f23a0af Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Wed, 27 May 2015 22:37:06 +0200
Subject: [PATCH 01/12] britney_util: Typo fix

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney_util.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/britney_util.py b/britney_util.py
index fbe39f5..1043ad3 100644
--- a/britney_util.py
+++ b/britney_util.py
@@ -482,7 +482,7 @@ def write_sources(sources_s, filename):
     """Write a sources file from Britney's state for a given suite
 
     Britney discards fields she does not care about, so the resulting
-    file omitts a lot of regular fields.
+    file omits a lot of regular fields.
     """
 
     key_pairs = ((VERSION, 'Version'), (SECTION, 'Section'),
-- 
2.1.4

From 2047eb1e50e72c4a071aa3b129a695aa5bdaf9fe Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Mon, 11 May 2015 22:12:04 +0200
Subject: [PATCH 02/12] Remove unused local variable

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/britney.py b/britney.py
index 5c2171e..58e8eef 100755
--- a/britney.py
+++ b/britney.py
@@ -988,7 +988,6 @@ class Britney(object):
             # look for the package in the virtual packages list and loop on them
             for prov in binaries[1].get(name, []):
                 if prov not in binaries[0]: continue
-                package = binaries[0][prov]
                 # A provides only satisfies:
                 # - an unversioned dependency (per Policy Manual Â§7.5)
                 # - a dependency without an architecture qualifier
-- 
2.1.4

From d017fe1a8c377ad2e6ab371f9942b429f191aa3f Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Mon, 11 May 2015 21:58:54 +0200
Subject: [PATCH 03/12] Remove "leading" from hints - it provides no new
 information

---
 britney.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/britney.py b/britney.py
index 58e8eef..9598893 100755
--- a/britney.py
+++ b/britney.py
@@ -2417,7 +2417,6 @@ class Britney(object):
         if init:
             if not force:
                 lundo = []
-            self.output_write("leading: %s\n" % (",".join( x.uvname for x in init )))
             for x in init:
                 if x not in upgrade_me:
                     self.output_write("failed: %s\n" % (x.uvname))
-- 
2.1.4

From e7256471754f4d88b8848ac5c1275849729d817b Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Mon, 11 May 2015 21:57:23 +0200
Subject: [PATCH 04/12] Be verbose when rejecting a hint due to invalid
 candidates

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/britney.py b/britney.py
index 9598893..a9027eb 100755
--- a/britney.py
+++ b/britney.py
@@ -2419,7 +2419,7 @@ class Britney(object):
                 lundo = []
             for x in init:
                 if x not in upgrade_me:
-                    self.output_write("failed: %s\n" % (x.uvname))
+                    self.output_write("failed: %s is not a valid candidate (or it already migrated)\n" % (x.uvname))
                     return None
                 selected.append(x)
                 upgrade_me.remove(x)
-- 
2.1.4

From d5cfd5aebd22c955cff4605437ee245af37f5cac Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Wed, 27 May 2015 07:53:56 +0200
Subject: [PATCH 05/12] britney.py: Remove redundant arg to format

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/britney.py b/britney.py
index a9027eb..61178b9 100755
--- a/britney.py
+++ b/britney.py
@@ -2299,7 +2299,7 @@ class Britney(object):
         dependencies = self.dependencies
         check_packages = partial(self._check_packages, binaries)
 
-        self.output_write("recur: [%s] %s %d/%d\n" % ("", ",".join(x.uvname for x in selected), len(packages), len(extra)))
+        self.output_write("recur: [] %s %d/%d\n" % (",".join(x.uvname for x in selected), len(packages), len(extra)))
 
         # loop on the packages (or better, actions)
         while packages:
-- 
2.1.4

From bf3aa08023ec7966aea5d90a2032a6a794a7aea3 Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Sat, 9 Aug 2014 21:35:09 +0200
Subject: [PATCH 06/12] britney.py: Minor optimisation to sort_actions

Avoid some cases of O(n^2) behaviour in sort_actions and reduce the
size of n for the remaining O(n^2)-ish behaviour by filtering out
removals early on.

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py | 48 +++++++++++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/britney.py b/britney.py
index 61178b9..eb47d46 100755
--- a/britney.py
+++ b/britney.py
@@ -2745,29 +2745,39 @@ class Britney(object):
         so the ones with most reverse dependencies are at the end of the loop.
         If an action depends on another one, it is put after it.
         """
-        upgrade_me = [x.name for x in self.excuses if x.name in [y.uvname for y in self.upgrade_me]]
-        for e in self.excuses:
-            if e.name not in upgrade_me: continue
-            # try removes at the end of the loop
-            elif e.name[0] == '-':
-                upgrade_me.remove(e.name)
-                upgrade_me.append(e.name)
-            # otherwise, put it in a good position checking its dependencies
+        uvnames = frozenset(y.uvname for y in self.upgrade_me)
+        excuses = [e for e in self.excuses if e.name in uvnames]
+        removals = []
+        upgrade_me = []
+
+        for e in excuses:
+            # We order removals and regular migrations differently, so
+            # split them out early.
+            if e.name[0] == '-':
+                removals.append(e.name)
             else:
-                pos = []
-                udeps = [upgrade_me.index(x) for x in e.deps if x in upgrade_me and x != e.name]
-                if len(udeps) > 0:
-                    pos.append(max(udeps))
-                sdeps = [upgrade_me.index(x) for x in e.sane_deps if x in upgrade_me and x != e.name]
-                if len(sdeps) > 0:
-                    pos.append(min(sdeps))
-                if len(pos) == 0: continue
-                upgrade_me.remove(e.name)
-                upgrade_me.insert(max(pos)+1, e.name)
-                self.dependencies[e.name] = e.deps
+                upgrade_me.append(e.name)
+
+        for e in excuses:
+            # put the item (regular migration) in a good position
+            # checking its dependencies
+            pos = []
+            udeps = [upgrade_me.index(x) for x in e.deps if x in upgrade_me and x != e.name]
+            if udeps:
+                pos.append(max(udeps))
+            sdeps = [upgrade_me.index(x) for x in e.sane_deps if x in upgrade_me and x != e.name]
+            if sdeps:
+                pos.append(min(sdeps))
+            if not pos:
+                continue
+            upgrade_me.remove(e.name)
+            upgrade_me.insert(max(pos)+1, e.name)
+            self.dependencies[e.name] = e.deps
 
         # replace the list of actions with the new one
         self.upgrade_me = [ make_migrationitem(x, self.sources) for x in upgrade_me ]
+        self.upgrade_me.extend(make_migrationitem(x, self.sources) for x in removals)
+
 
     def auto_hinter(self):
         """Auto-generate "easy" hints.
-- 
2.1.4

From 0b9006c36b9539a4f038fb7c12891e725ee9a8f8 Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Wed, 27 May 2015 22:34:42 +0200
Subject: [PATCH 07/12] britney.py: Pre-split self.options.*_arches

We always use them as lists, so we might as well just split them once
and be done with it.
---
 britney.py | 42 ++++++++++++++++++++++++------------------
 1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/britney.py b/britney.py
index eb47d46..1a07f50 100755
--- a/britney.py
+++ b/britney.py
@@ -430,16 +430,22 @@ class Britney(object):
         if not hasattr(self.options, "heidi_delta_output"):
             self.options.heidi_delta_output = self.options.heidi_output + "Delta"
 
+        self.options.nobreakall_arches = self.options.nobreakall_arches.split()
+        self.options.fucked_arches = self.options.fucked_arches.split()
+        self.options.break_arches = self.options.break_arches.split()
+        self.options.new_arches = self.options.new_arches.split()
+
         # Sort the architecture list
         allarches = sorted(self.options.architectures.split())
-        arches = [x for x in allarches if x in self.options.nobreakall_arches.split()]
-        arches += [x for x in allarches if x not in arches and x not in self.options.fucked_arches.split()]
-        arches += [x for x in allarches if x not in arches and x not in self.options.break_arches.split()]
-        arches += [x for x in allarches if x not in arches and x not in self.options.new_arches.split()]
+        arches = [x for x in allarches if x in self.options.nobreakall_arches]
+        arches += [x for x in allarches if x not in arches and x not in self.options.fucked_arches]
+        arches += [x for x in allarches if x not in arches and x not in self.options.break_arches]
+        arches += [x for x in allarches if x not in arches and x not in self.options.new_arches]
         arches += [x for x in allarches if x not in arches]
         self.options.architectures = [sys.intern(arch) for arch in arches]
         self.options.smooth_updates = self.options.smooth_updates.split()
 
+
     def __log(self, msg, type="I"):
         """Print info messages according to verbosity level
         
@@ -1045,7 +1051,7 @@ class Britney(object):
 
             # for the solving packages, update the excuse to add the dependencies
             for p in packages:
-                if arch not in self.options.break_arches.split():
+                if arch not in self.options.break_arches:
                     if p in self.sources['testing'] and self.sources['testing'][p][VERSION] == self.sources[suite][p][VERSION]:
                         excuse.add_dep("%s/%s" % (p, arch), arch)
                     else:
@@ -1397,7 +1403,7 @@ class Britney(object):
                     base = 'stable'
                 text = "Not yet built on <a href=\"http://buildd.debian.org/status/logs.php?arch=%s&pkg=%s&ver=%s&suite=%s\"; target=\"_blank\">%s</a> (relative to testing)" % (quote(arch), quote(src), quote(source_u[VERSION]), base, arch)
 
-                if arch in self.options.fucked_arches.split():
+                if arch in self.options.fucked_arches:
                     text = text + " (but %s isn't keeping up, so never mind)" % (arch)
                 else:
                     update_candidate = False
@@ -1436,7 +1442,7 @@ class Britney(object):
 
                 # if the package is architecture-dependent or the current arch is `nobreakall'
                 # find unsatisfied dependencies for the binary package
-                if binary_u[ARCHITECTURE] != 'all' or arch in self.options.nobreakall_arches.split():
+                if binary_u[ARCHITECTURE] != 'all' or arch in self.options.nobreakall_arches:
                     self.excuse_unsat_deps(pkg, src, arch, suite, excuse)
 
             # if there are out-of-date packages, warn about them in the excuse and set update_candidate
@@ -1458,7 +1464,7 @@ class Britney(object):
                         "arch=%s&pkg=%s&ver=%s\" target=\"_blank\">%s</a>: %s" % \
                         (quote(arch), quote(src), quote(source_u[VERSION]), arch, oodtxt)
 
-                if arch in self.options.fucked_arches.split():
+                if arch in self.options.fucked_arches:
                     text = text + " (but %s isn't keeping up, so nevermind)" % (arch)
                 else:
                     update_candidate = False
@@ -1763,7 +1769,7 @@ class Britney(object):
         for arch in self.options.architectures:
             if requested_arch and arch != requested_arch: continue
             # if it is in the nobreakall ones, check arch-independent packages too
-            check_archall = arch in self.options.nobreakall_arches.split()
+            check_archall = arch in self.options.nobreakall_arches
 
             # check all the packages for this architecture
             nuninst[arch] = set()
@@ -1807,7 +1813,7 @@ class Britney(object):
             elif original and arch in original:
                 n = len(original[arch])
             else: continue
-            if arch in self.options.break_arches.split():
+            if arch in self.options.break_arches:
                 totalbreak = totalbreak + n
             else:
                 total = total + n
@@ -2231,7 +2237,7 @@ class Britney(object):
 
         removals = set()
         all_affected = set()
-        nobreakall_arches = self.options.nobreakall_arches.split()
+        nobreakall_arches = self.options.nobreakall_arches
         packages_t = self.binaries['testing']
         check_packages = partial(self._check_packages, packages_t)
         nuninst = {}
@@ -2293,9 +2299,9 @@ class Britney(object):
         binaries = self.binaries['testing']
         sources = self.sources
         architectures = self.options.architectures
-        nobreakall_arches = self.options.nobreakall_arches.split()
-        new_arches = self.options.new_arches.split()
-        break_arches = self.options.break_arches.split()
+        nobreakall_arches = self.options.nobreakall_arches
+        new_arches = self.options.new_arches
+        break_arches = self.options.break_arches
         dependencies = self.dependencies
         check_packages = partial(self._check_packages, binaries)
 
@@ -2454,7 +2460,7 @@ class Britney(object):
                                               newly_uninst(nuninst_start, nuninst_end)) + "\n")
 
         if not force:
-            break_arches = set(self.options.break_arches.split())
+            break_arches = set(self.options.break_arches)
             if all(x.architecture in break_arches for x in selected):
                 # If we only migrated items from break-arches, then we
                 # do not allow any regressions on these architectures.
@@ -2528,16 +2534,16 @@ class Britney(object):
         allpackages = []
         normpackages = self.upgrade_me[:]
         archpackages = {}
-        for a in self.options.break_arches.split():
+        for a in self.options.break_arches:
             archpackages[a] = [p for p in normpackages if p.architecture == a]
             normpackages = [p for p in normpackages if p not in archpackages[a]]
         self.upgrade_me = normpackages
         self.output_write("info: main run\n")
         self.do_all()
         allpackages += self.upgrade_me
-        for a in self.options.break_arches.split():
+        for a in self.options.break_arches:
             backup = self.options.break_arches
-            self.options.break_arches = " ".join(x for x in self.options.break_arches.split() if x != a)
+            self.options.break_arches = " ".join(x for x in self.options.break_arches if x != a)
             self.upgrade_me = archpackages[a]
             self.output_write("info: broken arch run for %s\n" % (a))
             self.do_all()
-- 
2.1.4

From 72eb6af7115808bf72c6c8393cbfc7fd07e42f64 Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Sun, 31 May 2015 22:13:03 +0200
Subject: [PATCH 08/12] britney: Optimise original auto-hinter duplication
 handling

Britney is now smart enough to produce the same result from hints
regardless of the order of the items in the hint.  With this in mind,
we can have the original auto-hinter produce hints as sets and filter
out duplicates as we produce them.

Note that the hints are sorted to produce deterministic output (to
make it easier to compare the hints between runs and changes).

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py | 51 +++++++++++++++++++--------------------------------
 1 file changed, 19 insertions(+), 32 deletions(-)

diff --git a/britney.py b/britney.py
index 1a07f50..9ad4333 100755
--- a/britney.py
+++ b/britney.py
@@ -2849,16 +2849,21 @@ class Britney(object):
         # loop on them
         candidates = []
         mincands = []
+        seen_hints = set()
         for e in excuses:
             excuse = excuses[e]
             if e in sources_t and sources_t[e][VERSION] == excuse.ver[1]:
                 continue
             if excuse.deps:
                 hint = find_related(e, {}, True)
-                if isinstance(hint, dict) and e in hint and hint not in candidates:
-                    candidates.append(hint.items())
+                if isinstance(hint, dict) and e in hint:
+                    h = frozenset(hint.items())
+                    if h not in seen_hints:
+                        candidates.append(h)
+                        seen_hints.add(h)
             else:
                 items = [ (e, excuse.ver[1]) ]
+                orig_size = 1
                 looped = False
                 for item, ver in items:
                     # excuses which depend on "item" or are depended on by it
@@ -2866,39 +2871,21 @@ class Britney(object):
                        (item in excuses[x].deps or x in excuses[item].deps) \
                        and (x, excuses[x].ver[1]) not in items )
                     if not looped and len(items) > 1:
-                        mincands.append(items[:])
+                        orig_size = len(items)
+                        h = frozenset(items)
+                        if h not in seen_hints:
+                            mincands.append(h)
+                            seen_hints.add(h)
                     looped = True
-                if (len(items) > 1 and len(items) != len(mincands[-1]) and
-                    frozenset(items) != frozenset(mincands[-1])):
-                    candidates.append(items)
+                if len(items) != orig_size:
+                    h = frozenset(items)
+                    if h != mincands[-1] and h not in seen_hints:
+                        candidates.append(h)
+                        seen_hints.add(h)
 
         for l in [ candidates, mincands ]:
-            to_skip = set()
-            for i in range(len(l)):
-                if i in to_skip:
-                    continue
-                l_i = None
-                for j in range(i+1, len(l)):
-                    if j in to_skip:
-                        # we already know this list isn't interesting
-                        continue
-                    if l_i is None:
-                        l_i = frozenset(l[i])
-                    l_j = frozenset(l[j])
-                    if l_i >= l_j:
-                        # j is a subset of i; ignore it
-                        to_skip.add(j)
-                    elif l_i < l_j:
-                        # i is a subset of j; ignore it and the rest of the
-                        # "i" series.
-                        # NB: We use < and not <= because the "==" case is
-                        # already covered above
-                        to_skip.add(i)
-                        break
-            for i in range(len(l)):
-                if i not in to_skip:
-                    self.do_hint("easy", "autohinter", [ MigrationItem("%s/%s" % (x[0], x[1])) for x in l[i] ])
-
+            for hint in l:
+                self.do_hint("easy", "autohinter", [ MigrationItem("%s/%s" % (x[0], x[1])) for x in sorted(hint) ])
 
     def nuninst_arch_report(self, nuninst, arch):
         """Print a report of uninstallable packages for one architecture."""
-- 
2.1.4

From 843952d627b9a1a12ef08c923bc6b4612f8134fe Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Sun, 31 May 2015 22:26:16 +0200
Subject: [PATCH 09/12] Avoid O(n^2) duplication handling when building hints

Use a set to filter out seen items to avoid doing O(n^2)
de-duplication.  For very large hints, this can take considerable
time.

Using "seen_items" to build the actual hints on the (unverified)
assumption that Python can do something "smart" to turn a set into a
frozenset faster than it can with a list.

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/britney.py b/britney.py
index 9ad4333..d36a19b 100755
--- a/britney.py
+++ b/britney.py
@@ -2862,23 +2862,29 @@ class Britney(object):
                         candidates.append(h)
                         seen_hints.add(h)
             else:
-                items = [ (e, excuse.ver[1]) ]
+                items = [(e, excuse.ver[1])]
                 orig_size = 1
                 looped = False
+                seen_items = set()
+                seen_items.update(items)
+
                 for item, ver in items:
                     # excuses which depend on "item" or are depended on by it
-                    items.extend( (x, excuses[x].ver[1]) for x in excuses if \
+                    new_items = [(x, excuses[x].ver[1]) for x in excuses if \
                        (item in excuses[x].deps or x in excuses[item].deps) \
-                       and (x, excuses[x].ver[1]) not in items )
+                       and (x, excuses[x].ver[1]) not in seen_items]
+                    items.extend(new_items)
+                    seen_items.update(new_items)
+
                     if not looped and len(items) > 1:
                         orig_size = len(items)
-                        h = frozenset(items)
+                        h = frozenset(seen_items)
                         if h not in seen_hints:
                             mincands.append(h)
                             seen_hints.add(h)
                     looped = True
                 if len(items) != orig_size:
-                    h = frozenset(items)
+                    h = frozenset(seen_items)
                     if h != mincands[-1] and h not in seen_hints:
                         candidates.append(h)
                         seen_hints.add(h)
-- 
2.1.4

From 0043b09c45240d16514a6176d8a77c38088787fb Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Tue, 2 Jun 2015 19:22:37 +0200
Subject: [PATCH 10/12] britney.py: Avoid some O(n) look-ups in the auto-hinter

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/britney.py b/britney.py
index d36a19b..d4625b8 100755
--- a/britney.py
+++ b/britney.py
@@ -2805,7 +2805,9 @@ class Britney(object):
                    type="I")
 
         # consider only excuses which are valid candidates
-        excuses = dict((x.name, x) for x in self.excuses if x.name in [y.uvname for y in self.upgrade_me])
+        valid_excuses = frozenset(y.uvname for y in self.upgrade_me)
+        excuses = dict((x.name, x) for x in self.excuses if x.name in valid_excuses)
+        excuses_deps = dict((name, set(excuse.deps)) for name, excuse in excuses.items())
         sources_t = self.sources['testing']
 
         groups = set()
@@ -2871,7 +2873,7 @@ class Britney(object):
                 for item, ver in items:
                     # excuses which depend on "item" or are depended on by it
                     new_items = [(x, excuses[x].ver[1]) for x in excuses if \
-                       (item in excuses[x].deps or x in excuses[item].deps) \
+                       (item in excuses_deps[x] or x in excuses_deps[item]) \
                        and (x, excuses[x].ver[1]) not in seen_items]
                     items.extend(new_items)
                     seen_items.update(new_items)
-- 
2.1.4

From da83e4cc23682b7d64fdc11b258a4f7b16a920f7 Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Wed, 6 Aug 2014 23:27:11 +0200
Subject: [PATCH 11/12] iter_pkg: Refactor nuninst cloning

Move nuninst cloning out of the check loop and always populate the new
nuninst entirely.

This will allow some simplifications in other places.

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/britney.py b/britney.py
index d4625b8..5405f46 100755
--- a/britney.py
+++ b/britney.py
@@ -2291,9 +2291,9 @@ class Britney(object):
         position = len(packages)
 
         if nuninst:
-            nuninst_comp = nuninst.copy()
+            nuninst_comp = nuninst
         else:
-            nuninst_comp = self.nuninst_orig.copy()
+            nuninst_comp = self.nuninst_orig
 
         # local copies for better performance
         binaries = self.binaries['testing']
@@ -2332,18 +2332,36 @@ class Britney(object):
             self.output_write("trying: %s\n" % (item.uvname))
 
             better = True
-            nuninst = {}
 
             # apply the changes
             affected, undo = self.doop_source(item, lundo)
 
-            # check the affected packages on all the architectures
-            for arch in (item.architecture == 'source' and architectures or (item.architecture,)):
-                check_archall = arch in nobreakall_arches
+            # Copy nuninst_comp - we have to deep clone affected
+            # architectures.
 
+            # NB: We do this *after* updating testing and we have to filter out
+            # removed binaries.  Otherwise, uninstallable binaries that were
+            # removed by the item would still be counted.
+            if item.architecture == 'source':
+                # Assume that all architectures are affected and deep
+                # copy nuninst_comp
+                nuninst = {}
+                for arch in architectures:
+                    nuninst[arch] = set(x for x in nuninst_comp[arch] if x in binaries[arch][0])
+                    nuninst[arch + "+all"] = set(x for x in nuninst_comp[arch + "+all"] if x in binaries[arch][0])
+            else:
+                # Shallow clone nuninst_comp except for the affected
+                # architecture, which is deep cloned.
+                arch = item.architecture
+                nuninst = nuninst_comp.copy()
                 nuninst[arch] = set(x for x in nuninst_comp[arch] if x in binaries[arch][0])
                 nuninst[arch + "+all"] = set(x for x in nuninst_comp[arch + "+all"] if x in binaries[arch][0])
 
+
+            # check the affected packages on all the architectures
+            for arch in (item.architecture == 'source' and architectures or (item.architecture,)):
+                check_archall = arch in nobreakall_arches
+
                 check_packages(arch, affected, check_archall, nuninst)
 
                 # if the uninstallability counter is worse than before, break the loop
@@ -2368,8 +2386,7 @@ class Britney(object):
                     self.output_write("   all: %s\n" % (" ".join( x.uvname for x in selected )))
                 else:
                     self.output_write("  most: (%d) .. %s\n" % (len(selected), " ".join(x.uvname for x in selected[-20:])))
-                for k in nuninst:
-                    nuninst_comp[k] = nuninst[k]
+                nuninst_comp = nuninst
             else:
                 self.output_write("skipped: %s (%d <- %d)\n" % (item.uvname, len(extra), len(packages)))
                 self.output_write("    got: %s\n" % (self.eval_nuninst(nuninst, item.architecture != 'source' and nuninst_comp or None)))
-- 
2.1.4

From 5f98e6b37b56843aa745265aaaf67698c493bba7 Mon Sep 17 00:00:00 2001
From: Niels Thykier <niels@thykier.net>
Date: Wed, 27 May 2015 17:08:23 +0200
Subject: [PATCH 12/12] Create a clone_nuninst function for
 iter_packages{,_hint}

Signed-off-by: Niels Thykier <niels@thykier.net>
---
 britney.py      | 33 ++++++++++-----------------------
 britney_util.py | 15 +++++++++++++++
 2 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/britney.py b/britney.py
index 5405f46..3b2651d 100755
--- a/britney.py
+++ b/britney.py
@@ -204,7 +204,8 @@ from britney_util import (old_libraries_format, same_source, undo_changes,
                           read_nuninst, write_nuninst, write_heidi,
                           eval_uninst, newly_uninst, make_migrationitem,
                           write_excuses, write_heidi_delta, write_controlfiles,
-                          old_libraries, is_nuninst_asgood_generous)
+                          old_libraries, is_nuninst_asgood_generous,
+                          clone_nuninst)
 from consts import (VERSION, SECTION, BINARIES, MAINTAINER, FAKESRC,
                    SOURCE, SOURCEVER, ARCHITECTURE, DEPENDS, CONFLICTS,
                    PROVIDES, RDEPENDS, RCONFLICTS, MULTIARCH, ESSENTIAL)
@@ -2240,7 +2241,6 @@ class Britney(object):
         nobreakall_arches = self.options.nobreakall_arches
         packages_t = self.binaries['testing']
         check_packages = partial(self._check_packages, packages_t)
-        nuninst = {}
 
 
         for item in hinted_packages:
@@ -2258,15 +2258,11 @@ class Britney(object):
                 lundo.append((undo,item))
 
         # deep copy nuninst (in case the hint is undone)
-        # NB: We do this *after* updating testing and we have to filter out
+        # NB: We do this *after* updating testing as we have to filter out
         # removed binaries.  Otherwise, uninstallable binaries that were
         # removed by the hint would still be counted.
-        for arch in self.options.architectures:
-            nuninst_arch = self.nuninst_orig[arch]
-            nuninst_arch_all = self.nuninst_orig[arch + '+all']
-            binaries_t_a = packages_t[arch][0]
-            nuninst[arch] = set(x for x in nuninst_arch if x in binaries_t_a)
-            nuninst[arch + '+all'] = set(x for x in nuninst_arch_all if x in binaries_t_a)
+        nuninst = clone_nuninst(self.nuninst_orig, packages_t,
+                                self.options.architectures)
 
         for arch in self.options.architectures:
             check_archall = arch in nobreakall_arches
@@ -2339,27 +2335,18 @@ class Britney(object):
             # Copy nuninst_comp - we have to deep clone affected
             # architectures.
 
-            # NB: We do this *after* updating testing and we have to filter out
+            # NB: We do this *after* updating testing as we have to filter out
             # removed binaries.  Otherwise, uninstallable binaries that were
             # removed by the item would still be counted.
             if item.architecture == 'source':
-                # Assume that all architectures are affected and deep
-                # copy nuninst_comp
-                nuninst = {}
-                for arch in architectures:
-                    nuninst[arch] = set(x for x in nuninst_comp[arch] if x in binaries[arch][0])
-                    nuninst[arch + "+all"] = set(x for x in nuninst_comp[arch + "+all"] if x in binaries[arch][0])
+                affected_architectures = architectures
             else:
-                # Shallow clone nuninst_comp except for the affected
-                # architecture, which is deep cloned.
-                arch = item.architecture
-                nuninst = nuninst_comp.copy()
-                nuninst[arch] = set(x for x in nuninst_comp[arch] if x in binaries[arch][0])
-                nuninst[arch + "+all"] = set(x for x in nuninst_comp[arch + "+all"] if x in binaries[arch][0])
+                affected_architectures = [item.architecture]
+            nuninst = clone_nuninst(nuninst_comp, binaries, affected_architectures)
 
 
             # check the affected packages on all the architectures
-            for arch in (item.architecture == 'source' and architectures or (item.architecture,)):
+            for arch in affected_architectures:
                 check_archall = arch in nobreakall_arches
 
                 check_packages(arch, affected, check_archall, nuninst)
diff --git a/britney_util.py b/britney_util.py
index 1043ad3..346bac3 100644
--- a/britney_util.py
+++ b/britney_util.py
@@ -595,3 +595,18 @@ def is_nuninst_asgood_generous(architectures, old, new, break_arches=frozenset()
             continue
         diff = diff + (len(new[arch]) - len(old[arch]))
     return diff <= 0
+
+
+def clone_nuninst(nuninst, packages_s, architectures):
+    """Selectively deep clone nuninst
+
+    Given nuninst table, the package table for a given suite and
+    a list of architectures, this function will clone the nuninst
+    table.  Only the listed architectures will be deep cloned -
+    the rest will only be shallow cloned.
+    """
+    clone = nuninst.copy()
+    for arch in architectures:
+        clone[arch] = set(x for x in nuninst[arch] if x in packages_s[arch][0])
+        clone[arch + "+all"] = set(x for x in nuninst[arch + "+all"] if x in packages_s[arch][0])
+    return clone
-- 
2.1.4

Attachment: signature.asc
Description: OpenPGP digital signature

Reply to:

Follow-Ups:
- Re: ITM: Britney2 - refactoring and optimisation of the original auto-hinter
  - From: Niels Thykier <niels@thykier.net>

Prev by Date: Bug#787423: jessie-pu: package getmail4/4.46.0-1+debu8u1 with debdiff
Next by Date: Debian Installer Stretch Alpha 1
Previous by thread: Bug#786886: /usr/bin/regedit-development/wine32
Next by thread: Re: ITM: Britney2 - refactoring and optimisation of the original auto-hinter
Index(es):
- Date
- Thread