Re: Couple of patches

To: debian-lint-maint@lists.debian.org
Subject: Re: Couple of patches
From: Raphael Geissert <atomo64+debian@gmail.com>
Date: Wed, 11 Feb 2009 01:53:51 -0600
Message-id: <[🔎] gmu08d$rph$1@ger.gmane.org>
References: <[🔎] gmb6m7$f0n$1@ger.gmane.org> <[🔎] 87wsc2svuo.fsf@windlord.stanford.edu> <[🔎] gml6bl$eju$1@ger.gmane.org> <[🔎] 87prhsz2u5.fsf@windlord.stanford.edu>

Russ Allbery wrote:

> Raphael Geissert writes:
>> Russ Allbery wrote:
[...]
>> I remember seeing the other day some unneeded files being generated at
>> unpack level 1 of binary packages.
> 
> More detail would be good.  The only things that level 1 unpack of binary
> packages generates are the control directory and index, the file indices,
> and the breakdown of the package control information, plus a symlink.
> 

The fields don't seem to be used by any collection script except for
diffstat, all the other collection scripts "use" them just to make sure
they were run on the right directory, something I believe is not very
likely to ever happen; and it would be easy to notice it.

The generation of index and index-owner-id is a bit suboptimal, only one
call to dpkg-deb is needed. The tar file could be stored in disk (or left
in memory, but the idea is to avoid the call to dpkg-deb which will just
spawn at least two more processes) which should perform better because the
file would remain in the cache. The level 1 unpack script could then be
told whether the unpack level 2 script will  be run afterwards so that the
tarball is left in place and later deleted by the level 2 script.

> It's worth remembering that almost no one runs Lintian with restricted
> checks or with anything other than the default settings (thus unpacking
> everything to level two).  That doesn't mean we should completely ignore
> those methods of running Lintian, but they shouldn't be a high priority
> for development work, and I'm not sure trying to optimize that is a
> particularly good use of time.

I know, and agree.

> 
>> I personally would prefer dropping unpack and doing all that stuff in
>> collection with a proper and simple dependency resolver (by simple I
>> mean keeping a low complexity level by not introducing things like
>> Conflicts). I see no reason to keep the unpack scripts as what all they
>> do, IMO, perfectly fit in the collection/ concept, and they both try to
>> do the same thing. An example of this is the file-info collection
>> script, because what it generates is an index of all the files with the
>> file information.
> 
> If we were designing Lintian from the start, I'm not sure I see a point in
> the unpack scripts and I think that sounds like a basically sound model.
> Eliminating them is a lot of work, though.  I guess I'm not feeling
> particularly inspired to do the work.  I could review it if you or someone
> else does it, since in the long run having one fewer set of scripts will
> probably save on some maintenance burden.  But there's a lot of
> backward-compatibility implications, and if you want my input on priority,
> I think there are other things to work on that would be a lot more useful.
> 

I could probably spend some time on this. Also, as I demonstrated above, it
could be better if the two unpack scripts were merged to optimise the whole
process. Would you, or anyone else, have any objection on moving towards
this idea? If there's none, I could do it in a couple of hours, including
testing.

>> Yeah, benchmarking/profiling is required.
> 
> Does anyone know of a good Perl profiling method?  I did some cursory
> searches for modules and didn't come up with anything that looked horribly
> promising.lexical
> 

http://perldoc.perl.org/perlfaq3.html#How-do-I-profile-my-Perl-programs%3f
http://perl.apache.org/docs/1.0/guide/performance.html#Code_Profiling_Techniques

I just started to optimise the code, reduce number of calls, etc based on
the profiling data (by moving some code I reduced over 2k function calls).
Will send some patches tomorrow.

I've also been reading about perl and memory and I think I now have a better
idea of where to start looking at to reduce memory consumption.

Very interesting and helpful reading:
http://perl.apache.org/docs/1.0/guide/performance.html

Cheers,
-- 
Raphael Geissert - Debian Maintainer
www.debian.org - get.debian.net

Reply to:

Follow-Ups:
- Re: Couple of patches
  - From: Raphael Geissert <atomo64+debian@gmail.com>
- Re: Couple of patches
  - From: Russ Allbery <rra@debian.org>

References:
- Couple of patches
  - From: Raphael Geissert <atomo64+debian@gmail.com>
- Re: Couple of patches
  - From: Russ Allbery <rra@debian.org>
- Re: Couple of patches
  - From: Raphael Geissert <atomo64+debian@gmail.com>
- Re: Couple of patches
  - From: Russ Allbery <rra@debian.org>

Prev by Date: Bug#514697: marked as done ([lib/Lintian/Collect/Binary.pm] missing "use Lintian::Colection")
Next by Date: Bug#514853: lintian: "Distribution" field conforming to standard is not accepted
Previous by thread: Re: Couple of patches
Next by thread: Re: Couple of patches
Index(es):
- Date
- Thread