[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#917478: popularity-contest: Improve performance (4x faster)



On Sun, Jan 27, 2019 at 01:19:26PM +0100, Bill Allombert wrote:
> On Thu, Dec 27, 2018 at 11:19:45PM +0100, Benoît wrote:
> > Package: popularity-contest
> > Version: 1.67
> > Severity: minor
> > Tags: patch
> > 
> > Dear Maintainer,
> > 
> > For each installed packages, popcon globs the complete list of files
> > in /var/lib/dpkg/info.
> > This is very slow as I noticed that popcon takes more than a minute of CPU
> > time on my modest laptop, which is enough to start the fan.
> > 
> > I'm attaching a patch that lists only once /var/lib/dpkg/info and associates
> > each .list file with a package.
> > 
> > I don't see any difference in /usr/sbin/popularity-contest output.
> > And the CPU time goes from 1min08s to 0min14s.
> 
> Hello Benoît,
> 
> Thanks for your patch!
> 
> I tried it and it output warnings for multiarch packages which have both
> a amd64.list and a i386.list like
> 
> /var/lib/dpkg/info/gcc-4.8-base:amd64.list
> /var/lib/dpkg/info/gcc-4.8-base:i386.list 
> 
> I get:
> Use of uninitialized value $_ in open at ./popularity-contest line 146,
> <PACKAGES> line 285.
> 
> Do you understand what happen ?
> 

Not really.
But I can see that multiarch packages are processed several times.
And the part i don't understand is that processing one deletes it's files
list, hence the undef.

Adding a simple check solves this.

> Cheers,
> -- 
> Bill. <ballombe@debian.org>
> 
> Imagine a large red swirl here. 

-- 
Benoît Dejean
--- /usr/sbin/popularity-contest	2018-08-09 20:41:19.000000000 +0200
+++ ./popularity-contest	2019-02-10 10:25:14.353413546 +0100
@@ -119,6 +119,19 @@
   close DIVERSIONS;
 }
 
+my %pkgs_files = ();
+
+if (opendir(my $DPKG_DB, $dpkg_db))
+{
+    for my $e (readdir($DPKG_DB)) {
+	if ($e =~ m/^([^:]+) .*? \. list$/x) {
+	    $pkgs_files{$1} ||= [];
+	    push @{$pkgs_files{$1}}, "$dpkg_db/$e";
+	}
+    }
+    closedir($DPKG_DB);
+}
+
 # Read dpkg database of installed packages
 open PACKAGES, "dpkg-query --show --showformat='\${status} \${package}\\n'|";
 while (<PACKAGES>)
@@ -127,8 +140,10 @@
   my $pkg=$1;
   my $bestatime = undef;
   my $list;
+  # dpkg-query reports multiple times the same package for diff archs
+  next if $popcon{$pkg};
   $popcon{$pkg}=[0,0,$pkg,"<NOFILES>"];
-  foreach ("$dpkg_db/$pkg.list", glob("$dpkg_db/$pkg:*.list"))
+  foreach (@{$pkgs_files{$pkg}})
   {
     open FILES, $_ or next;
     while (<FILES>)

Reply to: