[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Lintian as a static analysis framework



On 2011-07-11 11:53, Stefano Zacchiroli wrote:
> On Sun, Jul 10, 2011 at 01:08:23PM +0200, Niels Thykier wrote:
>>[...]
>>
>> I think I was unclear here; my intention for the question was more down
>> the lines of how will you access the data (e.g. the root dir)?  Do you
>> plan on doing all the processing when sync'ing or will you also need to
>> do queries in-between syncs?
>>   As I recall, we (mostly?) use the former pattern on lintian.d.o, so if
>> you need the other type of queries we may need to make a few changes to
>> our code to accommodate your needs. :)
> 
> Ah, I see.
> 
> In the Coccinelle use case, everything is asynchronous with respect to
> the sync. This is so because one full run of Coccinelle on the whole
> source archive takes at present about 25 hours (for the curious lurker
> this is so using 36 cores running in parallel). Of course we're not
> doing full runs every-time something changes, but it might happen that
> we need to do a full run, for instance upon changes to the Coccinelle
> patterns used to find bugs.
> 
> The sources.d.o use case runs much faster and can be done at sync time,
> even when starting from an empty sources.d.o repository.
> 
> I guess that from an API point of view, all it's needed to support both
> use cases is:
> 
> 1) an interface to define hooks that are executed at sync time (to be
>    used for the "quick" use cases)
> 2) an interface to query the current content of the lintian lab
> 
> In fact, (1) is optional and one can do everything using (2), although
> it might be a bit more cumbersome.
> 
> Another issue to be faced are race conditions between querying the lab
> and using the retrieved information.
> 
> [...]

I spent a bit of time figuring out how stuff is currently done.
Unfortunately there seems to have been a tendency to access the
internals in the Lab direct rather than making an API for it.

API-wise for the Lab we got something like:
  * new instance
  * create_lab
  * delete_lab
  * is_created (does it look like a valid lab)
  * get_lab_package

The former is sort of a mix between a getter, except you have to know /
provide information that only makes sense if you are creating a new entry.

The API for the entries just got a little better, but it is not ready.
For now they can be created, removed and check if they exists.  I also
linked them with the Lintian::Collect interfaces that provides access to
the data in the entry (such as "does the package have file XYZ" or "what
type is file ZYX").
  But unfortunately we are still lacking the part where you can ask for
data to be collected.  Technically this can be done with:

 $ lintian --unpack [--keep-lab] $pkg

But I find it sub-optimal.

As I see it, there are a number of things we need to finish that.

  * An interface to the collections; mostly something that handles
    loading of them so people does not have to parse desc files and
    find the scripts manually.
  * An API to order/schedule them according to their dependencies.
    - I would prefer not having to expose people to the DepMap
      stuff we are doing in frontend/lintian.  That being said, it
      could be a simple wrapper around the DepMap module.
  * Extend lab entry API to run the collections.

With that the Lab entry code would be fairly useful.

I will continue to work on this and hopefully have something ready for
the Lintian BoF at DebConf.

~Niels

[--keep-lab] The --keep-lab stops Lintian from auto-removing the
unpacked directory.


Reply to: