Re: Restructuring check scripts

To: debian-lint-maint@lists.debian.org
Subject: Re: Restructuring check scripts
From: Raphael Geissert <atomo64+debian@gmail.com>
Date: Sun, 17 May 2009 01:49:26 -0500
Message-id: <[🔎] guoc00$b2o$1@ger.gmane.org>
References: <[🔎] gunt25$ie2$1@ger.gmane.org> <[🔎] 87hbzkmm5v.fsf@windlord.stanford.edu>

Russ Allbery wrote:
> Raphael Geissert writes:
[...]
> Some of the checks may not break apart into separate functions very
> easily, but there doesn't seem to be any drawback.

Indeed, most of them are files-based checks, such as
checks/{files,shared-objects,scripts}

> 
> We could even start adding POD documentation explaining what checks are
> trying to do, and then use that to generate a more detailed manual....
> 

I didn't think about that, it sounds like an excellent idea.

>> On another different but not too distant topic, I'd like to propose
>> adding per-tag needs-info. Of course a global needs-info would still
>> be allowed to declare collection scripts needed by most/all the tags.
> 
> The concern I have with this is that it seems really difficult to
> maintain and test.  We already have a problem with people forgetting to
> add need-info to whole check scripts, leading to hidden bugs later.
> This multiplies that problem by a lot, and testing it would require some
> fairly massive expansion of our test suite (and would be really slow).

I wouldn't consider it much problem as long as the test and changes I made
to include that information in every Lintian::Collect::* method are
included and the usage of Lintian::Collect becomes the canonical data
accessor.

> 
>> The idea is to later introduce an easy-to-use method to Tags that
>> would allow a check script to know whether a given tag would ever be
>> printed. If it is never going to be printed, why care about processing
>> some data? why care to collect unused information?
> 
> I think the amount of time we're saving here isn't worth the complexity.
> 

I'm hesitant about this. frontend/lintian already does something similar
regarding running complete check scripts if no tag will be printed, which
is a good idea in general, but a performance killer on ordinary runs.

>> A perfect example for this is spelling-error-in-binary, which needs -I
>> and -E to be displayed. If the tag would never be displayed, and it is
>> the only one requiring the 'strings' collection script (oops, it ain't
>> the best example after all, since we now have embedded-zlib) then that
>> collection script is not run and therefore the check script doesn't
>> spend time on it (which is the only benefit it would gain in this
>> case.)
> 
> Right, remember 95% of our optimization work should be on making running
> the full set of checks faster, since that's almost the only way that
> lintian is called in practice.  If we can make other things fast in the
> process, that's nice, but not particularly important.
> 

What I'm suggesting is more likely to happen, since not everyone runs
lintian with -I, nor -E, and many less with --pedantic.
This is an idea I've been playing around in my mind for a while, and always
ended up with the same dilemma: how to determine what is less expensive
between running some code, or determining whether that code should be run.
I always though some sort of Weight complementary field could help, but
again, evaluating the weight of a tag could be more expensive than running
the code that would produce the tag.

Cheers,
-- 
Raphael Geissert - Debian Maintainer
www.debian.org - get.debian.net

Reply to:

References:
- Restructuring check scripts
  - From: Raphael Geissert <atomo64+debian@gmail.com>
- Re: Restructuring check scripts
  - From: Russ Allbery <rra@debian.org>

Prev by Date: Re: Dependency-based running of collection scripts
Next by Date: [SCM] Debian package checker branch, master, updated. 2.2.10-48-gad32ef5
Previous by thread: Re: Restructuring check scripts
Next by thread: Re: Restructuring check scripts
Index(es):
- Date
- Thread