Re: Dependency-based running of collection scripts

To: debian-lint-maint@lists.debian.org
Subject: Re: Dependency-based running of collection scripts
From: Raphael Geissert <atomo64+debian@gmail.com>
Date: Sun, 17 May 2009 01:31:01 -0500
Message-id: <[🔎] guoatf$96m$1@ger.gmane.org>
References: <grrvv4$1s5$1@ger.gmane.org> <[🔎] 874ovto5co.fsf@windlord.stanford.edu> <[🔎] gu5uto$kik$1@ger.gmane.org> <[🔎] 87bpq1l6xh.fsf@windlord.stanford.edu> <[🔎] gunroe$ft3$1@ger.gmane.org> <[🔎] 87ljowmmni.fsf@windlord.stanford.edu>

Russ Allbery wrote:

> Raphael Geissert writes:
>> Russ Allbery wrote:
> 
>>> Checks are done in the main Lintian process.  Collection scripts are
>>> done in the background.  If we've unblocked a check, the main Lintian
>>> process can just go do all unblocked checks right away.  When it
>>> comes back, it can see if any collection scripts have finished and
>>> therefore any more checks are unblocked.
> 
>> The idea is:
>> 1.- we are running nothing, kick off the first set of scripts (in the
>> future this should be the unpack scripts, and since all the checks and
>> collection scripts require at least unpack level one it blocks)
>> 2.- so now we can start a bunch of collection scripts, let's do that.
>> 3.- one of those collection scripts is done, it made available: 1 check,
>> 0 collection scripts. Let's work on that check.
>> 4.- We are done with that check, we come back and look at the list of
>> running jobs and we see that one is done, it made available: 1 check
>> script, 2 collection scripts; let's first kick off those collection
>> scripts and get back to the check script when we are sure *there's
>> something else running it parallel*.
>> 5.- Repeat 3 and 4 over and over again.
> 
> Right, that's what I just said...?  I think.  I'm not sure that I
> followed your explanation, but it sounds to me like the same thing that
> I described.

I tried to stress the point of starting now selectable() collection scripts
prior running any check script, which isn't exactly what you are saying
with:

> Why do we need to do that?
>
> Checks are done in the main Lintian process.  Collection scripts are
> done in the background.  If we've unblocked a check, the main Lintian
> process can just go do all unblocked checks right away.  When it comes
> back, it can see if any collection scripts have finished and therefore
> any more checks are unblocked.

> 
>>> But that's exactly what the next_nodes function in my example gives
>>> you.
> 
>> Yes, but what I mean is: I don't see the point in doing, basically,
>> the same thing DepMap is already doing (and is already coded.)
> 
> The point for me is that it's considerably shorter and much less code,
> which will make it easier to understand and maintain in the long run.
> 

All the code is documented and tested, which provides a working example of
the intended use. Maintenance of the code should really be minimal.

The select()/satisfy() approach adds the possibility to keep partial states,
so that if for any reason one does not proceed in the linear way as
proposed by your approach it is possible to stop and continue later, or
work at the same time.

I'm afraid merging those two states doesn't convince me.

>>>>>     # start a bunch of children storing PIDs in %children
>>>>>     while (%children) {
>>>>>         my $child = wait;
> 
>>>> The problem I see here is that we are again 'wait'ing, which in
>>>> other words means: blocking.
> 
>>> It will only block if no children are finished, which is the one case
>>> where we *do* have to block because we can't do anything else.
> 
>> But that doesn't play nice with the idea of running checks in the
>> meanwhile on the main process.
> 
> Sure it does....  We only call wait if we can't proceed on the basis of
> what we know is currently finished.  If we have anything else we can do,
> we do that.  Otherwise, we call wait, which will only block if none of
> our children have exited (which means that we *can't* do anything else
> since we're still blocked).

I see what you mean. Maybe I got distracted by the fact that wait only
applies to the forked processes, but doesn't apply to check scripts when I
first read it. The latter would be handled somewhere else.

> 
>> Sure, IPC::Run might not be the best, but it works and its
>> implementation is irrelevant to the dependencies-driven processing
>> change.
> 
> We seem to be talking past each other....  IPC::Run doesn't provide the
> interface that we need to do this efficiently, which is why you have a
> bunch of code in your patch for polling child processes rather than just
> calling wait, which does the same thing in much less code.

No. What I was saying is that that part is specific to the Lintian::Command
code, and its implementation should not interfer with the rest of the code.
That is, no matter whether we use IPC::Run or fork, the interface should be
the same and the rest of the code should therefore need no change.

> 
>> By the way, could you please elaborate the following a bit more?
> 
>>> But I'm concerned it may also be overkill and pretty complicated for
>>> the problem we're solving.  I'm not sure that this is the right
>>> approach, or at least I think it could be simplified a lot.
> 
>> If I remove the co-dependencies stuff would, in your opinion, make it
>> less complicated?
> 
> Yes, but I think removing that and selectable reduces it to basically
> the two functions in my previous message, unless I'm missing something.
> 

Like I mentioned above, it leaves the two states functions (select,
satisfy), which since you merged them together it looked like you
considered them to be an overkill. Is that right?

>> Only make sure that no matter in which order nodes are added we always
>> get a consistent state. This eliminates the need of first gathering
>> all the information about collection and checks and later carefully
>> enter them (which is not only a waste of time, but complicated to
>> achieve, since you would actually need a dependency resolver to enter
>> the data in the right order).
> 
> I'm pretty sure the code in my previous message also does that.
> 

Of course, I was just justifying.

Summarising:
* Co-dependencies will be replaced by another layer between the frontend and
the dependencies resolver (by the way, you didn't comment on my points, may
I imply you are ok with them?.)
* Lintian::Command needs to be refactored so that it is easier to start
multiple jobs where pipes are not needed.
* export() is to be removed, and its cloning functionality replaced by
on-the-fly map regeneration
* an initialise() method will ensure the resolver's state, eliminating
cloning
Other changes I think they should be made:
* Remove the dependency on oneself in add()
* Add a test that creates the map just like the frontend would and looks for
missing and circular[2] dependencies (this would in turn require some
refactoring to avoid code duplication between the frontend and the test,
and would deprecate the needs-info test[1].)
* select() should return 0 instead of dying if the given node is already
selected

[1] I just noticed that the needs-info test is incorrectly assuming that the
name of the collection script is always the same as the value of
Collector-Script. The frontend uses the latter, and so the test should do.

[2] A deep circular dependencies finder could probably be implemented,
basically trying to resolve the map until the point where %map is empty
and %nodes is either empty (representing a fully resolvable map) or not
(representing an unresolvable situation).

Comments?

Cheers,
-- 
Raphael Geissert - Debian Maintainer
www.debian.org - get.debian.net

Reply to:

References:
- Re: Dependency-based running of collection scripts
  - From: Russ Allbery <rra@debian.org>
- Re: Dependency-based running of collection scripts
  - From: Raphael Geissert <atomo64+debian@gmail.com>
- Re: Dependency-based running of collection scripts
  - From: Russ Allbery <rra@debian.org>
- Re: Dependency-based running of collection scripts
  - From: Raphael Geissert <atomo64+debian@gmail.com>
- Re: Dependency-based running of collection scripts
  - From: Russ Allbery <rra@debian.org>

Prev by Date: Re: Restructuring check scripts
Next by Date: Re: Restructuring check scripts
Previous by thread: Re: Dependency-based running of collection scripts
Next by thread: [SCM] Debian package checker branch, master, updated. 2.2.10-30-g417acdd
Index(es):
- Date
- Thread