[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RFC: further parallelisation (dependency-based collection and check scripts)



Hi everyone,

Over the last week I found some free time to finally finish the work on the
initial implementation of a dependencies-based lintian.
Following the suggestions made by Russ on my previous, partial,
implementation, I got rid of the co-dependencies and avoided the ->export()
stuff among other changes. I also extended the test suite to cover some
extra cases and other, at the moment unused, features of Lintian::DepMap.

With some little changes (writing the documentation of
Lintian::PDepMap::addp for example) this initial implementation could be
merged in main lintian. The only commit that could (and should) be merged
without waiting for the rest is c6ef22.

Test suite-based status: except for the missing POD documentation which is
intentional (it's a "hey, this should not be merged right-away") the rest
of the test suite passes.

Regressions:
I must say I'm not happy with the resulting speed, as in most cases lintian
will now take _longer_ to process the packages (although it provides some
visual feedback sooner, which might make it look a bit faster).
I haven't done any sort of profiling, but it looks like lintian is now
consuming a lot of CPU by trying to reap the collection scripts jobs; CPU
that could be used by those collection scripts. This should be addressed by
not using IPC::Run to start the collection scripts[1], as we previously
discussed on the other thread; and as such it is something I'll start
working on soon.

[1] Like I said on a previous email, using wait() to, duh, wait for
IPC::Run-started jobs causes troubles. And sleeping between calls to
reap_collection_jobs is really not the right way to do it.

Short-term TODO:
* Finish POD documentation.
* Don't use IPC::Run to start the collection scripts so that we reduce the
extra overhead and to allow us to use wait().
* Cleanup frontend/lintian.

Long-term TODO:
* Split some of the check scripts so that they require less collection
scripts.
* Split and add more collection scripts.

Other things to consider:
* It is at the moment impossible to run the checks as soon as they could
because the overrides are usually not loaded by then. The current
workaround is to wait until the override file has been collected and
loaded, but it would be better if the Tags module knew when it is ready to
start printing the results and use a cache in the meanwhile[2].
* I'm not entirely happy with the need to prefix the collection and checks
before they are added to the dependencies tree, but that's needed because
of the name conflicts between them.
* Probably reconsider the name of Lintian::DepMap; after all, it creates
dependencies trees (the original name was based on the idea of supporting
more complex kinds of relationships which could make a graph look more like
a map than a tree).

[2] This cache could even be used to store the results of the whole package
check so that the tags could be printed in order of severity, as some
people have suggested (at least on IRC).

Feedback is very much welcomed.

Cheers,
-- 
Raphael Geissert - Debian Developer
www.debian.org - get.debian.net

Attachment: lintian-depbased.mbox
Description: application/mbox


Reply to: