Bug#699083: lintian: optimize t/tests suite
On 2013-01-27 11:17, Niels Thykier wrote:
> Package: lintian
> Version: 2.5.11
> Severity: normal
>
> Hi,
>
> [...]
>
I have come up with a prototype patch that reduces some overhead in the
unpacker. It changes the call-interface between Lintian and the
collections, so I figured I would send it to a review before pushing it.
AFAICT, collections were once sh scripts (as were checks)[1]. Over time
all of our checks and collections have been rewritten into pure Perl and
now takes advantage of our Perl modules. So currently, we do not need
the "external process" interface like we used to. By abusing the fact
that collections are written in Perl, we can greatly reduce the
overhead of spawning the collections.
I have opt'ed for keeping the old "fork+exec" interface; I'd like to
keep this option around for now as it is currently the only place we can
use "foreign" languages (which may come in handy in the future). Also,
it allowed me to be lazy and only convert a subset of the collections... :)
To get a feeling of the improvement works on real data we get...
With the patch:
fields-wrong-section 13.259s 20.733s 3.312s
Without the patch:
> fields-wrong-section 14.815s 32.818s 4.628s
Running Lintian on all our t/tests artifacts[3], this patch reduces the
user time with about 6 minutes. That is a drop from ~21 to ~15 minutes
in time's output. Real time seems to have dropped to 5m 30s from 6m
{0..30}s.
Comments appreciated,
~Niels
[1] "Root commit" (020888b), examples include checks/po-debconf and
coll/objdump-info. There is also a mention in the changelog about
chdir'ing before invoking the collector (or check?) to better support
AWK scripts.
[2] is stil:
> [2] Computed by:
>
> $ time debian/rules runtests onlyrun=<test>
[3]
Setup is:
$ debian/rules onlyrun=suite:tests
Test is:
for I in $(seq 1 10) ; do \
time LINTIAN_ROOT=. frontend/lintian --no-user-dirs --no-cfg -IE \
debian/test-out/*/*.changes > /dev/null; \
done 2>&1 | tee <name-of-choice>.log
Default parallelization for the test was 9 (i.e. -j9). For some reason,
the very first iteration tends to get vastly better runtimes than the rest.
Title: Check description file
>From 8891f7f1d4680be3900a50d80c1ef4365a8fb9ac Mon Sep 17 00:00:00 2001
From: Niels Thykier
Date: Wed, 30 Jan 2013 13:14:09 +0100
Subject: [PATCH] Add new collection interface to reduce overhead
The major overhead in running collections are loading the required
Perl Modules (again). As an example, loading Lintian::Util has a cost
of 0.040 seconds (the first time). With a couple of extra modules,
coll/hardening-info has a average "no-op" runtime of at least 0.150s.
Eliminating the module load overhead, it drops to 0.065s.
This patch set creates a new (backwards compatible) interface for
invoking collections. Instead of doing a complete fork+exec, the new
interface loads the collection script like a perl module and load a
"collect" sub (similar to how scripts work). This interface can still
be combined with fork for parallelization.
Because it avoids the exec, the child process retrains the cache of
(most of) the Perl Modules. This vastly reduces the "start up" time
for the collections. With the selected collections, about 0.5s (user)
is regained when processing the Lintian source + binary together.
For the Lintian part of the t/tests the effect accumulates to about 6
minutes of user time (21m -> 15m).
Signed-off-by: Niels Thykier
---
collection/ar-info | 10 +++++--
collection/ar-info.desc | 1 +
collection/file-info | 14 +++++++---
collection/file-info.desc | 1 +
collection/hardening-info | 11 +++++---
collection/hardening-info.desc | 1 +
collection/java-info | 26 +++++++++++-------
collection/java-info.desc | 1 +
collection/md5sums | 11 +++++---
collection/md5sums.desc | 1 +
collection/objdump-info | 12 ++++++---
collection/objdump-info.desc | 1 +
collection/strings | 15 ++++++++---
collection/strings.desc | 1 +
doc/lintian.xml | 6 ++---
lib/Lintian/CollScript.pm | 57 ++++++++++++++++++++++++++++++++++++++--
lib/Lintian/Unpacker.pm | 23 ++++++++++++++--
17 files changed, 157 insertions(+), 35 deletions(-)
diff --git a/collection/ar-info b/collection/ar-info
index 4e2c103..e891e13 100755
--- a/collection/ar-info
+++ b/collection/ar-info
@@ -19,6 +19,8 @@
# Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
# MA 02110-1301, USA.
+package Lintian::coll::ar_info;
+
use strict;
use warnings;
@@ -28,8 +30,8 @@ use Lintian::Collect;
use Lintian::Command qw(spawn);
use Lintian::Util qw(fail);
-($#ARGV == 2) or fail 'syntax: ar-info ';
-my ($pkg, $type, $dir) = @ARGV;
+sub collect {
+ my ($pkg, $type, $dir) = @_;
my $info = Lintian::Collect->new ($pkg, $type, $dir);
if ( -e "$dir/ar-info" ) {
@@ -59,6 +61,10 @@ foreach my $file ($info->sorted_index) {
}
close(OUT) or fail("cannot write ar-info: $!");
+}
+
+collect (@ARGV) if $0 =~ m,(?:^|/)ar-info$,;
+1;
# Local Variables:
# indent-tabs-mode: nil
diff --git a/collection/ar-info.desc b/collection/ar-info.desc
index 5772f6a..cd9e2e6 100644
--- a/collection/ar-info.desc
+++ b/collection/ar-info.desc
@@ -4,3 +4,4 @@ Info: This script runs the "ar t" command over all .a files of package.
Type: binary
Needs-Info: unpacked, index
Version: 1
+Interface: perl-coll
diff --git a/collection/file-info b/collection/file-info
index 075c470..1135fd3 100755
--- a/collection/file-info
+++ b/collection/file-info
@@ -19,6 +19,8 @@
# Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
# MA 02110-1301, USA.
+package Lintian::coll::file_info;
+
use strict;
use warnings;
@@ -29,12 +31,12 @@ use Lintian::Collect;
use Lintian::Command qw(spawn reap);
use Lintian::Util qw(fail);
-($#ARGV == 2) or fail 'syntax: file-info ';
-my ($pkg, $type, $dir) = @ARGV;
+sub collect {
+my ($pkg, $type, $dir) = @_;
my $info = Lintian::Collect->new ($pkg, $type, $dir);
my $last = '';
-my $helper = realpath("$0-helper");
+my $helper = realpath (__FILE__ . '-helper');
my $outfile = "$dir/file-info.gz";
if ( -e "$dir/file-info" ) {
@@ -66,6 +68,12 @@ foreach my $file ($info->sorted_index) {
close $opts{pipe_in};
reap(\%opts);
+}
+
+collect (@ARGV) if $0 =~ m,(?:^|/)file-info$,;
+
+1;
+
# Local Variables:
# indent-tabs-mode: nil
# cperl-indent-level: 4
diff --git a/collection/file-info.desc b/collection/file-info.desc
index 10a19c7..712ac47 100644
--- a/collection/file-info.desc
+++ b/collection/file-info.desc
@@ -4,3 +4,4 @@ Info: This script runs the file(1) command over all files of any kind of package
Type: binary, udeb, source
Version: 2
Needs-Info: unpacked, index
+Interface: perl-coll
diff --git a/collection/hardening-info b/collection/hardening-info
index a3343bd..20817e2 100755
--- a/collection/hardening-info
+++ b/collection/hardening-info
@@ -26,6 +26,8 @@
# Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
# MA 02110-1301, USA.
+package Lintian::coll::hardening_info;
+
use strict;
use warnings;
@@ -37,10 +39,11 @@ use Lintian::Collect;
use Lintian::Command qw(spawn reap);
use Lintian::Util qw(fail);
-my ($pkg, $type, $dir) = @ARGV;
+sub collect {
+my ($pkg, $type, $dir) = @_;
my $info = Lintian::Collect->new ($pkg, $type, $dir);
-my $helper = realpath("$0-helper");
+my $helper = realpath (__FILE__ . '-helper');
if ( -e "$dir/hardening-info" ) {
unlink "$dir/hardening-info" or fail "unlink hardening-info: $!";
@@ -73,8 +76,10 @@ foreach my $bin ($info->sorted_index) {
close $opts{pipe_in};
reap (\%opts);
+}
-exit 0;
+collect (@ARGV) if $0 =~ m,(?:^|/)hardening-info$,;
+1;
# Local Variables:
# indent-tabs-mode: nil
diff --git a/collection/hardening-info.desc b/collection/hardening-info.desc
index a4a7a7a..fbb4e23 100644
--- a/collection/hardening-info.desc
+++ b/collection/hardening-info.desc
@@ -5,3 +5,4 @@ Info: This script runs hardening-check(1) over all ELF binaries of a binary
Type: binary, udeb
Version: 4
Needs-Info: bin-pkg-control, file-info, index, unpacked
+Interface: perl-coll
diff --git a/collection/java-info b/collection/java-info
index bb342ef..f25eac4 100755
--- a/collection/java-info
+++ b/collection/java-info
@@ -19,6 +19,8 @@
# Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
# MA 02110-1301, USA.
+package Lintian::coll::java_info;
+
use strict;
use warnings;
@@ -30,7 +32,8 @@ use Lintian::Collect;
use Lintian::Command qw(spawn reap);
use Lintian::Util qw(fail);
-my ($pkg, $type, $dir) = @ARGV;
+sub collect {
+my ($pkg, $type, $dir) = @_;
my $info = Lintian::Collect->new ($pkg, $type, $dir);
# Sanity check
@@ -48,6 +51,14 @@ if ( -f "$dir/java-info.gz" ) {
# We lazily start the gzip process to avoid creating the java-info.gz
# file when there are no jar files in the package.
my %opts;
+my $open_java_info = sub {
+ %opts = ( pipe_in => FileHandle->new,
+ out => "$dir/java-info.gz",
+ fail => 'error' );
+ spawn(\%opts, ['gzip', '-9c'] );
+ $opts{pipe_in}->blocking(1);
+};
+
chdir ("$dir/unpacked")
or fail "unable to chdir to unpacked: $!";
@@ -61,7 +72,7 @@ foreach my $file ($info->sorted_index) {
my $manifest;
my $azip = Archive::Zip->new;
- open_java_info() unless %opts;
+ $open_java_info->() unless %opts;
# This script needs unzip, there's no way around.
print {$opts{pipe_in}} "-- $file\n";
@@ -117,15 +128,10 @@ if (%opts) {
reap (\%opts);
}
-exit 0;
+};
-sub open_java_info {
- %opts = ( pipe_in => FileHandle->new,
- out => "$dir/java-info.gz",
- fail => 'error' );
- spawn(\%opts, ['gzip', '-9c'] );
- $opts{pipe_in}->blocking(1);
-}
+collect (@ARGV) if $0 =~ m,(?:^|/)java-info$,;
+1;
# Local Variables:
# indent-tabs-mode: nil
diff --git a/collection/java-info.desc b/collection/java-info.desc
index 470b827..c319604 100644
--- a/collection/java-info.desc
+++ b/collection/java-info.desc
@@ -4,3 +4,4 @@ Info: This script extracts information from manifests of JAR files
Type: binary
Version: 3
Needs-Info: index, file-info, unpacked
+Interface: perl-coll
diff --git a/collection/md5sums b/collection/md5sums
index 97b1472..8b65a65 100755
--- a/collection/md5sums
+++ b/collection/md5sums
@@ -19,6 +19,8 @@
# Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
# MA 02110-1301, USA.
+package Lintian::coll::md5sums;
+
use strict;
use warnings;
@@ -28,9 +30,8 @@ use Lintian::Collect;
use Lintian::Command qw(spawn reap);
use Lintian::Util qw(fail);
-($#ARGV == 2) or fail 'syntax: md5sums ';
-
-my ($pkg, $type, $dir) = @ARGV;
+sub collect {
+my ($pkg, $type, $dir) = @_;
my $info = Lintian::Collect->new ($pkg, $type, $dir);
if ( -e "$dir/md5sums" ) {
@@ -54,6 +55,10 @@ foreach my $file ($info->sorted_index) {
close $opts{pipe_in};
reap(\%opts);
+}
+
+collect (@ARGV) if $0 =~ m,(?:^|/)md5sums$,;
+1;
# Local Variables:
# indent-tabs-mode: nil
diff --git a/collection/md5sums.desc b/collection/md5sums.desc
index 07349cb..b505526 100644
--- a/collection/md5sums.desc
+++ b/collection/md5sums.desc
@@ -4,3 +4,4 @@ Info: This script runs the md5sums(1) over all files in a binary package.
Type: binary, udeb
Version: 1
Needs-Info: unpacked, index
+Interface: perl-coll
diff --git a/collection/objdump-info b/collection/objdump-info
index 1b091f6..17cd44e 100755
--- a/collection/objdump-info
+++ b/collection/objdump-info
@@ -23,6 +23,8 @@
# Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
# MA 02110-1301, USA.
+package Lintian::coll::objdump_info;
+
use strict;
use warnings;
@@ -34,9 +36,10 @@ use Lintian::Collect;
use Lintian::Command qw(spawn reap);
use Lintian::Util qw(fail);
-my ($pkg, $type, $dir) = @ARGV;
+sub collect {
+my ($pkg, $type, $dir) = @_;
my $info = Lintian::Collect->new ($pkg, $type, $dir);
-my $helper = realpath("$0-helper");
+my $helper = realpath (__FILE__ . '-helper');
if ( -e "$dir/objdump-info" ) {
unlink "$dir/objdump-info" or fail "unlink objdump-info: $!"
@@ -66,7 +69,10 @@ foreach my $bin ($info->sorted_index) {
close $opts{pipe_in} or fail "cannot write objdump-info.gz: $!";
reap(\%opts);
-exit 0;
+};
+
+collect (@ARGV) if $0 =~ m,(?:^|/)objdump-info$,;
+1;
# Local Variables:
# indent-tabs-mode: nil
diff --git a/collection/objdump-info.desc b/collection/objdump-info.desc
index 5bc5af0..d072481 100644
--- a/collection/objdump-info.desc
+++ b/collection/objdump-info.desc
@@ -5,3 +5,4 @@ Info: This script runs objdump(1) over all binaries and object files of a
Type: binary, udeb
Version: 4
Needs-Info: file-info, unpacked, index
+Interface: perl-coll
diff --git a/collection/strings b/collection/strings
index 1d5da2e..8682bbe 100755
--- a/collection/strings
+++ b/collection/strings
@@ -19,6 +19,8 @@
# Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
# MA 02110-1301, USA.
+package Lintian::coll::strings;
+
use strict;
use warnings;
@@ -30,12 +32,13 @@ use Lintian::Collect;
use Lintian::Command qw(spawn reap);
use Lintian::Util qw(delete_dir fail);
-($#ARGV == 2) or fail('syntax: strings ');
-my ($pkg, $type, $dir) = @ARGV;
+sub collect {
+
+my ($pkg, $type, $dir) = @_;
my $info = Lintian::Collect->new ($pkg, $type, $dir);
-my $helper = realpath("$0-helper");
+my $helper = realpath(__FILE__ . '-helper');
my @manual = ();
if ( -e "$dir/elf-index" ) {
@@ -95,7 +98,11 @@ if (@manual) {
}
close(ELF_INDEX) or fail("cannot write elf-index file: $!");
-exit 0;
+}
+
+collect (@ARGV) if $0 =~ m,(?:^|/)strings$,;
+
+1;
# Local Variables:
# indent-tabs-mode: nil
diff --git a/collection/strings.desc b/collection/strings.desc
index 609dd17..4836db1 100644
--- a/collection/strings.desc
+++ b/collection/strings.desc
@@ -4,4 +4,5 @@ Info: This script runs the strings(1) command over all files of a binary
package.
Type: binary, udeb
Version: 2
+Interface: perl-coll
Needs-Info: index, unpacked, file-info
diff --git a/doc/lintian.xml b/doc/lintian.xml
index 25ae68f..a3e23d4 100644
--- a/doc/lintian.xml
+++ b/doc/lintian.xml
@@ -1229,9 +1229,9 @@ foo [!any-i386] binary: some-tag-not-for-i386 optional-extra
Check names without a forward slash (e.g. "fields") and names
- starting with "lintian/" are reserved for built-in checks.
- Vendors are recommended to use their vendor name before the
- first slash (e.g. "ubuntu/fields").
+ starting with either "lintian/" or "coll/" are reserved for
+ the Lintian core. Vendors are recommended to use their vendor
+ name before the first slash (e.g. "ubuntu/fields").
diff --git a/lib/Lintian/CollScript.pm b/lib/Lintian/CollScript.pm
index 8f90f39..bd45e3a 100644
--- a/lib/Lintian/CollScript.pm
+++ b/lib/Lintian/CollScript.pm
@@ -26,7 +26,7 @@ use base 'Class::Accessor';
use Carp qw(croak);
use File::Basename qw(dirname);
-use Lintian::Util qw(get_dsc_info);
+use Lintian::Util qw(fail get_dsc_info);
=head1 NAME
@@ -77,6 +77,8 @@ sub new {
'version' => $header->{'version'},
'type-table' => {},
'auto_remove' => 0,
+ 'interface' => $header->{'interface'}//'exec',
+ '_collect_sub' => undef,
};
$self->{'script_path'} = dirname ($file) . '/' . $self->{'name'};
$self->{'auto_remove'} = 1
@@ -152,10 +154,33 @@ Returns a truth value if the collection has the "auto-remove" flag.
Returns the absolute path to the collection script.
+=item interface
+
+The call interface for this collection script.
+
+=over 4
+
+=item exec
+
+The collection is run by invoking the script denoted by script_path
+with the proper arguments.
+
+This is the default value.
+
+=item perl-coll
+
+The collection is implemented in Perl in such a way that it can be
+loaded into perl and run via the L method.
+
+Collections that have the "perl-coll" can also be run as if they had
+the "exec" interface (see above).
+
+=back
+
=cut
Lintian::CollScript->mk_ro_accessors (qw(name type version auto_remove
- script_path
+ script_path interface
));
=item needs_info ([COND])
@@ -204,6 +229,34 @@ sub is_type {
return $self->{'type-table'}->{$type};
}
+=item collect (PKG, TASK, DIR)
+
+=cut
+
+sub collect {
+ my ($self, $pkg_name, $task, $dir) = @_;
+ my $collector = $self->{'_collect_sub'};
+ unless (defined $collector) {
+ my $cs_path = $self->script_path;
+ my $ppkg = $self->name;
+
+ $ppkg =~ s,[-.],_,go;
+ $ppkg =~ s,/,::,go;
+
+ require $cs_path;
+
+ {
+ no strict 'refs';
+ $collector = \&{'Lintian::coll::' . $ppkg . '::collect'}
+ if defined &{'Lintian::coll::' . $ppkg . '::collect'};
+ }
+ fail $self->name . ' does not have a collect function'
+ unless defined $collector;
+ $self->{'_collect_sub'} = $collector;
+ }
+ $collector->($pkg_name, $task, $dir);
+}
+
=back
=head1 AUTHOR
diff --git a/lib/Lintian/Unpacker.pm b/lib/Lintian/Unpacker.pm
index 39507ad..da22397 100644
--- a/lib/Lintian/Unpacker.pm
+++ b/lib/Lintian/Unpacker.pm
@@ -23,7 +23,9 @@ use warnings;
use base 'Class::Accessor';
-use Lintian::Command::Simple qw(background wait_any kill_all);
+use POSIX;
+
+use Lintian::Command::Simple qw(wait_any kill_all);
use Lintian::Util qw(fail);
=head1 NAME
@@ -432,7 +434,24 @@ sub process_tasks {
# collect info
$cmap->select ($coll);
$wlist->{'changed'} = 1;
- my $pid = background ($cs->script_path, $pkg_name, $pkg_type, $base);
+ my $pid = fork//-1;
+ if (not $pid) {
+ # child
+ my $ret = 0;
+ if ($cs->interface eq 'perl-coll') {
+ eval {
+ $cs->collect ($pkg_name, $pkg_type, $base);
+ };
+ if ($@) {
+ print STDERR $@;
+ $ret = 2;
+ }
+ } else {
+ exec $cs->script_path, $pkg_name, $pkg_type, $base
+ or die "exec $cs->script_path: $!";
+ }
+ POSIX::_exit ($ret);
+ }
$coll_hook->($lpkg, 'start', $cs, $pid) if $coll_hook;
if ($pid < 0) {
# failed - Lets not start any more jobs for this processable
--
1.7.10.4
Reply to: