[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: 2 patches for dpkg



Hi Modestas,

thanks for your patches! Here are some comments:

On Fri, 16 May 2008, Modestas Vainius wrote:
> you will find the proposed patches attached together with the extensive 
> descriptions (feel free to put them into the changelog if you decide to 
> apply). The patches should apply on top of 1.14.19 and should be independant 
> from each other. I would like to see them both in dpkg targetted for lenny, 
> because:

First of all, they won't land up in lenny. It's too late, dpkg is frozen.

> 0001 patch would help adoption of of symbol files for large library packages 
> (e.g. kde4libs. symbols is ~15 MB, gzipped 180 KB)

As you noted, the win is not that big. And to me the real question is
"does it make sense to use symbols files" for C++ libraries when:
- the files are huge and it's difficult to hand-edit since all symbols are
  mangled
- there are (almost) always arch-specific differences which render files
  even more difficult to maintain

Nevertheless, I'd be okay to implement something like that but you should
really rework the patch to support multiple compressions schemes. You can
do that easily by reusing the regex $comp_regex from Dpkg::Compression and
the objects Dpkg::Source::Compressor and/or Dpkg::Source::CompressedFile.

> Subject: [PATCH] Optimize dpkg-shlibdeps by caching symbol file and objdump objects
> 
> This patch optimizes dpkg-shlibdeps by caching parsed symbols files and
> objdump objects. This way neither of the libraries or symbols files are
> parsed more than once. This patch significantly improves performance of
> dpkg-shlibdeps bringing it near to performance levels of << 1.14.8
> dpkg-shlibdeps without loosing any of new functionally at all. Memory
> requirements are reduced too.

Why would it require less memory? Keeping a cache usually increases the
memory requirement... or is there a problem with perl's garbage collector?

> This patch SHOULD NOT change the end result of dpkg-shlibdeps. If it
> does, it is a bug.

But it will do so in a number of corner cases. If you want to cache
the result of some expensive functions, you must make sure that _all_
parameters that influence the output are the same:

>      my ($self, $file, $with_deprecated, $compress) = @_;
>      $compress = "" unless defined $compress;
> diff --git a/scripts/dpkg-shlibdeps.pl b/scripts/dpkg-shlibdeps.pl
> @@ -193,12 +197,23 @@ foreach my $file (keys %exec) {
>  	    my $dpkg_symfile;
>  	    if ($packagetype eq "deb") {
>  		# Use fine-grained dependencies only on real deb
> -		$dpkg_symfile = find_symbols_file($pkg, $soname, $lib);
> -		if (defined $dpkg_symfile) {
> -		    # Load symbol information
> -		    print "Using symbols file $dpkg_symfile for $soname\n" if $debug;
> -		    $symfile->load($dpkg_symfile);
> +		if (exists $dpkg_symfile_cache{$pkg}) {
> +		    if (defined $dpkg_symfile_cache{$pkg}) {
> +			    $dpkg_symfile = $dpkg_symfile_cache{$pkg}{file};
> +			    print "Using symbols file $dpkg_symfile (cached) for $soname\n" if $debug;
> +		    }
> +		} else {
> +		    $dpkg_symfile = find_symbols_file($pkg, $soname, $lib);
> +		    if (defined $dpkg_symfile) {
> +			# Load symbol information
> +			print "Using symbols file $dpkg_symfile for $soname\n" if $debug;
> +			$dpkg_symfile_cache{$pkg} = new Dpkg::Shlibs::SymbolFile();
> +			$dpkg_symfile_cache{$pkg}->load($dpkg_symfile);
> +		    } else {
> +			$dpkg_symfile_cache{$pkg} = undef;
> +		    }
>  		}
> +		$symfile->merge_from_symfile($dpkg_symfile_cache{$pkg}) if (defined($dpkg_symfile));
>  	    }

The output of find_symbols_file depend not only on $pkg but also on
$soname and $lib. You can't assume that you can reuse the same symbols
file simply because a previous call of find_symbols with the same $kg
returned something. The key of %dpkg_symfile_cache should really be
$dpkg_symfile and not $pkg.

>  	    if (defined($dpkg_symfile) && $symfile->has_object($soname)) {
>  		# Initialize dependencies with the smallest minimal version
> @@ -214,13 +229,26 @@ foreach my $file (keys %exec) {
>  		}
>  	    } else {
>  		# No symbol file found, fall back to standard shlibs
> -		my $id = $dumplibs_wo_symfile->parse($lib);
> +		$dpkg_objdump_cache{$pkg} = {} unless (exists $dpkg_objdump_cache{$pkg});
> +		my $id;
> +		my $libobj;
> +		if (exists $dpkg_objdump_cache{$pkg}{$lib}) {
> +		    $libobj = $dpkg_objdump_cache{$pkg}{$lib};
> +		    # We don't want to process the same lib more than once (redundant)
> +		    next if ($dumplibs_wo_symfile->get_object($libobj->get_id()));
> +		    $id = $dumplibs_wo_symfile->add_object($dpkg_objdump_cache{$pkg}{$lib});
> +		    print "Using objdump (cached) for $soname (file $lib)\n" if $debug;
> +		} else {
> +		    $id = $dumplibs_wo_symfile->parse($lib);
> +		    $libobj = $dumplibs_wo_symfile->get_object($id);
> +		    $dpkg_objdump_cache{$pkg}{$lib} = $libobj;
> +		    print "Using objdump for $soname (file $lib)\n" if $debug;
> +		}

Why are you using $pkg and $lib as key for this cache? $lib should be
enough as there's only one objdump output for a given binary file...

Cheers,
-- 
Raphaël Hertzog

Le best-seller français mis à jour pour Debian Etch :
http://www.ouaza.com/livre/admin-debian/


Reply to: