[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#699083: lintian: optimize t/tests suite



On 2013-01-27 11:17, Niels Thykier wrote:
> Package: lintian
> Version: 2.5.11
> Severity: normal
> 
> Hi,
> 
> [...]
>

I have come up with a prototype patch that reduces some overhead in the
unpacker.  It changes the call-interface between Lintian and the
collections, so I figured I would send it to a review before pushing it.

AFAICT, collections were once sh scripts (as were checks)[1].  Over time
all of our checks and collections have been rewritten into pure Perl and
now takes advantage of our Perl modules.  So currently, we do not need
the "external process" interface like we used to.  By abusing the fact
that collections are written in Perl, we can greatly reduce the
overhead of spawning the collections.

I have opt'ed for keeping the old "fork+exec" interface; I'd like to
keep this option around for now as it is currently the only place we can
use "foreign" languages (which may come in handy in the future).  Also,
it allowed me to be lazy and only convert a subset of the collections... :)

To get a feeling of the improvement works on real data we get...

With the patch:
   fields-wrong-section 13.259s 20.733s 3.312s

Without the patch:
>  fields-wrong-section 14.815s 32.818s 4.628s

Running Lintian on all our t/tests artifacts[3], this patch reduces the
user time with about 6 minutes.  That is a drop from ~21 to ~15 minutes
in time's output.  Real time seems to have dropped to 5m 30s from 6m
{0..30}s.

Comments appreciated,
~Niels

[1] "Root commit" (020888b), examples include checks/po-debconf and
coll/objdump-info.  There is also a mention in the changelog about
chdir'ing before invoking the collector (or check?) to better support
AWK scripts.

[2] is stil:
> [2] Computed by:
>
>  $ time debian/rules runtests onlyrun=<test>

[3]
Setup is:
 $ debian/rules onlyrun=suite:tests

Test is:
for I in $(seq 1 10) ; do \
    time LINTIAN_ROOT=. frontend/lintian --no-user-dirs --no-cfg -IE \
        debian/test-out/*/*.changes > /dev/null; \
done 2>&1 | tee <name-of-choice>.log

Default parallelization for the test was 9 (i.e. -j9).  For some reason,
the very first iteration tends to get vastly better runtimes than the rest.
Title: Check description file
>From 8891f7f1d4680be3900a50d80c1ef4365a8fb9ac Mon Sep 17 00:00:00 2001 From: Niels Thykier Date: Wed, 30 Jan 2013 13:14:09 +0100 Subject: [PATCH] Add new collection interface to reduce overhead The major overhead in running collections are loading the required Perl Modules (again). As an example, loading Lintian::Util has a cost of 0.040 seconds (the first time). With a couple of extra modules, coll/hardening-info has a average "no-op" runtime of at least 0.150s. Eliminating the module load overhead, it drops to 0.065s. This patch set creates a new (backwards compatible) interface for invoking collections. Instead of doing a complete fork+exec, the new interface loads the collection script like a perl module and load a "collect" sub (similar to how scripts work). This interface can still be combined with fork for parallelization. Because it avoids the exec, the child process retrains the cache of (most of) the Perl Modules. This vastly reduces the "start up" time for the collections. With the selected collections, about 0.5s (user) is regained when processing the Lintian source + binary together. For the Lintian part of the t/tests the effect accumulates to about 6 minutes of user time (21m -> 15m). Signed-off-by: Niels Thykier --- collection/ar-info | 10 +++++-- collection/ar-info.desc | 1 + collection/file-info | 14 +++++++--- collection/file-info.desc | 1 + collection/hardening-info | 11 +++++--- collection/hardening-info.desc | 1 + collection/java-info | 26 +++++++++++------- collection/java-info.desc | 1 + collection/md5sums | 11 +++++--- collection/md5sums.desc | 1 + collection/objdump-info | 12 ++++++--- collection/objdump-info.desc | 1 + collection/strings | 15 ++++++++--- collection/strings.desc | 1 + doc/lintian.xml | 6 ++--- lib/Lintian/CollScript.pm | 57 ++++++++++++++++++++++++++++++++++++++-- lib/Lintian/Unpacker.pm | 23 ++++++++++++++-- 17 files changed, 157 insertions(+), 35 deletions(-) diff --git a/collection/ar-info b/collection/ar-info index 4e2c103..e891e13 100755 --- a/collection/ar-info +++ b/collection/ar-info @@ -19,6 +19,8 @@ # Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, # MA 02110-1301, USA. +package Lintian::coll::ar_info; + use strict; use warnings; @@ -28,8 +30,8 @@ use Lintian::Collect; use Lintian::Command qw(spawn); use Lintian::Util qw(fail); -($#ARGV == 2) or fail 'syntax: ar-info '; -my ($pkg, $type, $dir) = @ARGV; +sub collect { + my ($pkg, $type, $dir) = @_; my $info = Lintian::Collect->new ($pkg, $type, $dir); if ( -e "$dir/ar-info" ) { @@ -59,6 +61,10 @@ foreach my $file ($info->sorted_index) { } close(OUT) or fail("cannot write ar-info: $!"); +} + +collect (@ARGV) if $0 =~ m,(?:^|/)ar-info$,; +1; # Local Variables: # indent-tabs-mode: nil diff --git a/collection/ar-info.desc b/collection/ar-info.desc index 5772f6a..cd9e2e6 100644 --- a/collection/ar-info.desc +++ b/collection/ar-info.desc @@ -4,3 +4,4 @@ Info: This script runs the "ar t" command over all .a files of package. Type: binary Needs-Info: unpacked, index Version: 1 +Interface: perl-coll diff --git a/collection/file-info b/collection/file-info index 075c470..1135fd3 100755 --- a/collection/file-info +++ b/collection/file-info @@ -19,6 +19,8 @@ # Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, # MA 02110-1301, USA. +package Lintian::coll::file_info; + use strict; use warnings; @@ -29,12 +31,12 @@ use Lintian::Collect; use Lintian::Command qw(spawn reap); use Lintian::Util qw(fail); -($#ARGV == 2) or fail 'syntax: file-info '; -my ($pkg, $type, $dir) = @ARGV; +sub collect { +my ($pkg, $type, $dir) = @_; my $info = Lintian::Collect->new ($pkg, $type, $dir); my $last = ''; -my $helper = realpath("$0-helper"); +my $helper = realpath (__FILE__ . '-helper'); my $outfile = "$dir/file-info.gz"; if ( -e "$dir/file-info" ) { @@ -66,6 +68,12 @@ foreach my $file ($info->sorted_index) { close $opts{pipe_in}; reap(\%opts); +} + +collect (@ARGV) if $0 =~ m,(?:^|/)file-info$,; + +1; + # Local Variables: # indent-tabs-mode: nil # cperl-indent-level: 4 diff --git a/collection/file-info.desc b/collection/file-info.desc index 10a19c7..712ac47 100644 --- a/collection/file-info.desc +++ b/collection/file-info.desc @@ -4,3 +4,4 @@ Info: This script runs the file(1) command over all files of any kind of package Type: binary, udeb, source Version: 2 Needs-Info: unpacked, index +Interface: perl-coll diff --git a/collection/hardening-info b/collection/hardening-info index a3343bd..20817e2 100755 --- a/collection/hardening-info +++ b/collection/hardening-info @@ -26,6 +26,8 @@ # Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, # MA 02110-1301, USA. +package Lintian::coll::hardening_info; + use strict; use warnings; @@ -37,10 +39,11 @@ use Lintian::Collect; use Lintian::Command qw(spawn reap); use Lintian::Util qw(fail); -my ($pkg, $type, $dir) = @ARGV; +sub collect { +my ($pkg, $type, $dir) = @_; my $info = Lintian::Collect->new ($pkg, $type, $dir); -my $helper = realpath("$0-helper"); +my $helper = realpath (__FILE__ . '-helper'); if ( -e "$dir/hardening-info" ) { unlink "$dir/hardening-info" or fail "unlink hardening-info: $!"; @@ -73,8 +76,10 @@ foreach my $bin ($info->sorted_index) { close $opts{pipe_in}; reap (\%opts); +} -exit 0; +collect (@ARGV) if $0 =~ m,(?:^|/)hardening-info$,; +1; # Local Variables: # indent-tabs-mode: nil diff --git a/collection/hardening-info.desc b/collection/hardening-info.desc index a4a7a7a..fbb4e23 100644 --- a/collection/hardening-info.desc +++ b/collection/hardening-info.desc @@ -5,3 +5,4 @@ Info: This script runs hardening-check(1) over all ELF binaries of a binary Type: binary, udeb Version: 4 Needs-Info: bin-pkg-control, file-info, index, unpacked +Interface: perl-coll diff --git a/collection/java-info b/collection/java-info index bb342ef..f25eac4 100755 --- a/collection/java-info +++ b/collection/java-info @@ -19,6 +19,8 @@ # Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, # MA 02110-1301, USA. +package Lintian::coll::java_info; + use strict; use warnings; @@ -30,7 +32,8 @@ use Lintian::Collect; use Lintian::Command qw(spawn reap); use Lintian::Util qw(fail); -my ($pkg, $type, $dir) = @ARGV; +sub collect { +my ($pkg, $type, $dir) = @_; my $info = Lintian::Collect->new ($pkg, $type, $dir); # Sanity check @@ -48,6 +51,14 @@ if ( -f "$dir/java-info.gz" ) { # We lazily start the gzip process to avoid creating the java-info.gz # file when there are no jar files in the package. my %opts; +my $open_java_info = sub { + %opts = ( pipe_in => FileHandle->new, + out => "$dir/java-info.gz", + fail => 'error' ); + spawn(\%opts, ['gzip', '-9c'] ); + $opts{pipe_in}->blocking(1); +}; + chdir ("$dir/unpacked") or fail "unable to chdir to unpacked: $!"; @@ -61,7 +72,7 @@ foreach my $file ($info->sorted_index) { my $manifest; my $azip = Archive::Zip->new; - open_java_info() unless %opts; + $open_java_info->() unless %opts; # This script needs unzip, there's no way around. print {$opts{pipe_in}} "-- $file\n"; @@ -117,15 +128,10 @@ if (%opts) { reap (\%opts); } -exit 0; +}; -sub open_java_info { - %opts = ( pipe_in => FileHandle->new, - out => "$dir/java-info.gz", - fail => 'error' ); - spawn(\%opts, ['gzip', '-9c'] ); - $opts{pipe_in}->blocking(1); -} +collect (@ARGV) if $0 =~ m,(?:^|/)java-info$,; +1; # Local Variables: # indent-tabs-mode: nil diff --git a/collection/java-info.desc b/collection/java-info.desc index 470b827..c319604 100644 --- a/collection/java-info.desc +++ b/collection/java-info.desc @@ -4,3 +4,4 @@ Info: This script extracts information from manifests of JAR files Type: binary Version: 3 Needs-Info: index, file-info, unpacked +Interface: perl-coll diff --git a/collection/md5sums b/collection/md5sums index 97b1472..8b65a65 100755 --- a/collection/md5sums +++ b/collection/md5sums @@ -19,6 +19,8 @@ # Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, # MA 02110-1301, USA. +package Lintian::coll::md5sums; + use strict; use warnings; @@ -28,9 +30,8 @@ use Lintian::Collect; use Lintian::Command qw(spawn reap); use Lintian::Util qw(fail); -($#ARGV == 2) or fail 'syntax: md5sums '; - -my ($pkg, $type, $dir) = @ARGV; +sub collect { +my ($pkg, $type, $dir) = @_; my $info = Lintian::Collect->new ($pkg, $type, $dir); if ( -e "$dir/md5sums" ) { @@ -54,6 +55,10 @@ foreach my $file ($info->sorted_index) { close $opts{pipe_in}; reap(\%opts); +} + +collect (@ARGV) if $0 =~ m,(?:^|/)md5sums$,; +1; # Local Variables: # indent-tabs-mode: nil diff --git a/collection/md5sums.desc b/collection/md5sums.desc index 07349cb..b505526 100644 --- a/collection/md5sums.desc +++ b/collection/md5sums.desc @@ -4,3 +4,4 @@ Info: This script runs the md5sums(1) over all files in a binary package. Type: binary, udeb Version: 1 Needs-Info: unpacked, index +Interface: perl-coll diff --git a/collection/objdump-info b/collection/objdump-info index 1b091f6..17cd44e 100755 --- a/collection/objdump-info +++ b/collection/objdump-info @@ -23,6 +23,8 @@ # Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, # MA 02110-1301, USA. +package Lintian::coll::objdump_info; + use strict; use warnings; @@ -34,9 +36,10 @@ use Lintian::Collect; use Lintian::Command qw(spawn reap); use Lintian::Util qw(fail); -my ($pkg, $type, $dir) = @ARGV; +sub collect { +my ($pkg, $type, $dir) = @_; my $info = Lintian::Collect->new ($pkg, $type, $dir); -my $helper = realpath("$0-helper"); +my $helper = realpath (__FILE__ . '-helper'); if ( -e "$dir/objdump-info" ) { unlink "$dir/objdump-info" or fail "unlink objdump-info: $!" @@ -66,7 +69,10 @@ foreach my $bin ($info->sorted_index) { close $opts{pipe_in} or fail "cannot write objdump-info.gz: $!"; reap(\%opts); -exit 0; +}; + +collect (@ARGV) if $0 =~ m,(?:^|/)objdump-info$,; +1; # Local Variables: # indent-tabs-mode: nil diff --git a/collection/objdump-info.desc b/collection/objdump-info.desc index 5bc5af0..d072481 100644 --- a/collection/objdump-info.desc +++ b/collection/objdump-info.desc @@ -5,3 +5,4 @@ Info: This script runs objdump(1) over all binaries and object files of a Type: binary, udeb Version: 4 Needs-Info: file-info, unpacked, index +Interface: perl-coll diff --git a/collection/strings b/collection/strings index 1d5da2e..8682bbe 100755 --- a/collection/strings +++ b/collection/strings @@ -19,6 +19,8 @@ # Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, # MA 02110-1301, USA. +package Lintian::coll::strings; + use strict; use warnings; @@ -30,12 +32,13 @@ use Lintian::Collect; use Lintian::Command qw(spawn reap); use Lintian::Util qw(delete_dir fail); -($#ARGV == 2) or fail('syntax: strings '); -my ($pkg, $type, $dir) = @ARGV; +sub collect { + +my ($pkg, $type, $dir) = @_; my $info = Lintian::Collect->new ($pkg, $type, $dir); -my $helper = realpath("$0-helper"); +my $helper = realpath(__FILE__ . '-helper'); my @manual = (); if ( -e "$dir/elf-index" ) { @@ -95,7 +98,11 @@ if (@manual) { } close(ELF_INDEX) or fail("cannot write elf-index file: $!"); -exit 0; +} + +collect (@ARGV) if $0 =~ m,(?:^|/)strings$,; + +1; # Local Variables: # indent-tabs-mode: nil diff --git a/collection/strings.desc b/collection/strings.desc index 609dd17..4836db1 100644 --- a/collection/strings.desc +++ b/collection/strings.desc @@ -4,4 +4,5 @@ Info: This script runs the strings(1) command over all files of a binary package. Type: binary, udeb Version: 2 +Interface: perl-coll Needs-Info: index, unpacked, file-info diff --git a/doc/lintian.xml b/doc/lintian.xml index 25ae68f..a3e23d4 100644 --- a/doc/lintian.xml +++ b/doc/lintian.xml @@ -1229,9 +1229,9 @@ foo [!any-i386] binary: some-tag-not-for-i386 optional-extra Check names without a forward slash (e.g. "fields") and names - starting with "lintian/" are reserved for built-in checks. - Vendors are recommended to use their vendor name before the - first slash (e.g. "ubuntu/fields"). + starting with either "lintian/" or "coll/" are reserved for + the Lintian core. Vendors are recommended to use their vendor + name before the first slash (e.g. "ubuntu/fields"). diff --git a/lib/Lintian/CollScript.pm b/lib/Lintian/CollScript.pm index 8f90f39..bd45e3a 100644 --- a/lib/Lintian/CollScript.pm +++ b/lib/Lintian/CollScript.pm @@ -26,7 +26,7 @@ use base 'Class::Accessor'; use Carp qw(croak); use File::Basename qw(dirname); -use Lintian::Util qw(get_dsc_info); +use Lintian::Util qw(fail get_dsc_info); =head1 NAME @@ -77,6 +77,8 @@ sub new { 'version' => $header->{'version'}, 'type-table' => {}, 'auto_remove' => 0, + 'interface' => $header->{'interface'}//'exec', + '_collect_sub' => undef, }; $self->{'script_path'} = dirname ($file) . '/' . $self->{'name'}; $self->{'auto_remove'} = 1 @@ -152,10 +154,33 @@ Returns a truth value if the collection has the "auto-remove" flag. Returns the absolute path to the collection script. +=item interface + +The call interface for this collection script. + +=over 4 + +=item exec + +The collection is run by invoking the script denoted by script_path +with the proper arguments. + +This is the default value. + +=item perl-coll + +The collection is implemented in Perl in such a way that it can be +loaded into perl and run via the L method. + +Collections that have the "perl-coll" can also be run as if they had +the "exec" interface (see above). + +=back + =cut Lintian::CollScript->mk_ro_accessors (qw(name type version auto_remove - script_path + script_path interface )); =item needs_info ([COND]) @@ -204,6 +229,34 @@ sub is_type { return $self->{'type-table'}->{$type}; } +=item collect (PKG, TASK, DIR) + +=cut + +sub collect { + my ($self, $pkg_name, $task, $dir) = @_; + my $collector = $self->{'_collect_sub'}; + unless (defined $collector) { + my $cs_path = $self->script_path; + my $ppkg = $self->name; + + $ppkg =~ s,[-.],_,go; + $ppkg =~ s,/,::,go; + + require $cs_path; + + { + no strict 'refs'; + $collector = \&{'Lintian::coll::' . $ppkg . '::collect'} + if defined &{'Lintian::coll::' . $ppkg . '::collect'}; + } + fail $self->name . ' does not have a collect function' + unless defined $collector; + $self->{'_collect_sub'} = $collector; + } + $collector->($pkg_name, $task, $dir); +} + =back =head1 AUTHOR diff --git a/lib/Lintian/Unpacker.pm b/lib/Lintian/Unpacker.pm index 39507ad..da22397 100644 --- a/lib/Lintian/Unpacker.pm +++ b/lib/Lintian/Unpacker.pm @@ -23,7 +23,9 @@ use warnings; use base 'Class::Accessor'; -use Lintian::Command::Simple qw(background wait_any kill_all); +use POSIX; + +use Lintian::Command::Simple qw(wait_any kill_all); use Lintian::Util qw(fail); =head1 NAME @@ -432,7 +434,24 @@ sub process_tasks { # collect info $cmap->select ($coll); $wlist->{'changed'} = 1; - my $pid = background ($cs->script_path, $pkg_name, $pkg_type, $base); + my $pid = fork//-1; + if (not $pid) { + # child + my $ret = 0; + if ($cs->interface eq 'perl-coll') { + eval { + $cs->collect ($pkg_name, $pkg_type, $base); + }; + if ($@) { + print STDERR $@; + $ret = 2; + } + } else { + exec $cs->script_path, $pkg_name, $pkg_type, $base + or die "exec $cs->script_path: $!"; + } + POSIX::_exit ($ret); + } $coll_hook->($lpkg, 'start', $cs, $pid) if $coll_hook; if ($pid < 0) { # failed - Lets not start any more jobs for this processable -- 1.7.10.4
Reply to: