[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RFC: further parallelisation (dependency-based collection and check scripts)



Raphael Geissert <geissert@debian.org> writes:

> Over the last week I found some free time to finally finish the work on
> the initial implementation of a dependencies-based lintian.  Following
> the suggestions made by Russ on my previous, partial, implementation, I
> got rid of the co-dependencies and avoided the ->export() stuff among
> other changes. I also extended the test suite to cover some extra cases
> and other, at the moment unused, features of Lintian::DepMap.

Excellent!

> Regressions:
> I must say I'm not happy with the resulting speed, as in most cases
> lintian will now take _longer_ to process the packages (although it
> provides some visual feedback sooner, which might make it look a bit
> faster).  I haven't done any sort of profiling, but it looks like
> lintian is now consuming a lot of CPU by trying to reap the collection
> scripts jobs; CPU that could be used by those collection scripts. This
> should be addressed by not using IPC::Run to start the collection
> scripts[1], as we previously discussed on the other thread; and as such
> it is something I'll start working on soon.

> [1] Like I said on a previous email, using wait() to, duh, wait for
> IPC::Run-started jobs causes troubles. And sleeping between calls to
> reap_collection_jobs is really not the right way to do it.

Yeah, that's pretty much what I was expecting.  And agreed, sleep isn't
the right solution.

> Short-term TODO:
> * Finish POD documentation.
> * Don't use IPC::Run to start the collection scripts so that we reduce the
> extra overhead and to allow us to use wait().
> * Cleanup frontend/lintian.

Sounds great.

> Long-term TODO:
> * Split some of the check scripts so that they require less collection
> scripts.
> * Split and add more collection scripts.

Also sounds great.  :)

> Other things to consider:
> * It is at the moment impossible to run the checks as soon as they could
> because the overrides are usually not loaded by then. The current
> workaround is to wait until the override file has been collected and
> loaded, but it would be better if the Tags module knew when it is ready
> to start printing the results and use a cache in the meanwhile[2].

Oh, yes, that's a good idea.  I don't think that Lintian::Tags should poll
the collect area directly, but I think it makes a lot of sense to cache
all the tags until the frontend (or, later, the relevant module) calls
file_overrides to load the overrides for that file, or file_end to say
that we're done with that file without processing any overrides.

(The logic of Lintian::Tags will get simpler when we promote changes files
to a first-class checkable object with its own check scripts.)

> * I'm not entirely happy with the need to prefix the collection and
> checks before they are added to the dependencies tree, but that's needed
> because of the name conflicts between them.

We could rename them so that there are no naming conflicts, although I
don't mind tagging them explicitly.  (Hm, I wonder it if would be
worthwhile to have DepMap or PDepMap handle that internally -- take a type
as well as a name and internally munge that into a unique identifier.)

> * Probably reconsider the name of Lintian::DepMap; after all, it creates
> dependencies trees (the original name was based on the idea of
> supporting more complex kinds of relationships which could make a graph
> look more like a map than a tree).

I'm good either way.

> [2] This cache could even be used to store the results of the whole
> package check so that the tags could be printed in order of severity, as
> some people have suggested (at least on IRC).

Hm, yeah, we could do that, although it would make Lintian appear slower
since it would have to hold all tags until the processing of the file
completes.

> Feedback is very much welcomed.

The code looks basically good to me after a once-over.  I think it's
certainly good enough to commit and we can sort out other things later,
although it would be nice to fix the Lintian::Command bits first.

A few minor stylistic notes: Lintian currently fairly uniformly uses
underscores_between_words rather than studlyCaps for methods, and I'd like
to stick with that (since it's also the perlstyle recommendation).  And
Lintian::PDepMap, since it's a subclass, should probably use the subclass
namespace (Lintian::DepMap::Properties or something like that, of course
with changes if you rename DepMap to something else).

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>


Reply to: