Re: RFC: further parallelisation (dependency-based collection and check scripts)

To: debian-lint-maint@lists.debian.org
Subject: Re: RFC: further parallelisation (dependency-based collection and check scripts)
From: Raphael Geissert <geissert@debian.org>
Date: Thu, 31 Dec 2009 01:01:55 -0600
Message-id: <[🔎] hhhi8u$df$1@ger.gmane.org>
References: <h9er7b$nse$1@ger.gmane.org> <[🔎] 878wcoonc6.fsf@windlord.stanford.edu> <[🔎] hh76be$pp8$1@ger.gmane.org> <[🔎] 878wcono3c.fsf@windlord.stanford.edu>

Russ Allbery wrote:
> When you get a chance to look it over, I'd be curious about your thoughts
> on the best module architecture.  I'm struggling to figure out what makes
> the most sense.  I think combining the tag, pkg_start, and pkg_end methods
> in with Lintian::Output would make sense, along with the configuration for
> which tags are displayed.


> I'm not sure how to handle overrides.  I wonder 
> if we should have a Lintian::Tag::Overrides class that parses override
> files and answers questions about them, such as whether a tag is
> overridden, but I'm not sure where the code looking for unused overrides
> should live or how it should work.

I was thinking we should have something like this:

* A tags container and manager (Lintian::Tags):
+ It is a singleton.
+ It would be responsible of creating tags (create_tag()) and setting their
properties.
- Overrides parsing should be moved somewhere else. Lintian::Tag::Overrides
+ It knows about two contexts, the container context (i.e. the check script)
and the tags context (i.e. $package-$type).
+ Calling new() with a tags context destroys its Lintian::Tag::Override
reference.
+ The tags context is shared with Lintian::Output when it interacts with it.
+ When tags are created with a tag context the override property of
Lintian::Tag::Single is set by checking if
Lintian::Tag::Overrides::matches($tag). Where $tag is a
Lintian::Tag::Single including $extra information.
+ Tags are created as they are issued. Never issued -> never created.
Suppressed -> never created.
- Lintian::Tags should not know about files, versions, architecture, etc.
+ The container context descriptor (.desc file) should only be read when
requested. Issuing a tag() loads it.
+ Check scripts runner (Lintian::Check atm) should switch the "context()" of
the Lintian::Tags object, so it knows what .desc file it should be looking
at.
+ displayed() doesn't know about overrides but does know about suppressed
tags (i.e. tag() won't create a ::Single object if !displayed()).
+ displayed() performs its checks by using a tag from create_tag() without
context.

* An overrides parser (Lintian::Tag::Overrides):
+ One or multiple files can be loaded()
+ Since it knows how to parse the files, it should know when a tag matches()
an entry.

* A single tag (Lintian::Tag::Single):
+ Based on the Lintian::Tag::Info code but without doing anything related to
references.
+ They are per-tag singletons.
+ It has an internal counter, every call to new() with a context resets it.
+ The extra and overridden properties are cleared out on every call to
new(). Only if a context is defined.
+ When all of their properties but extra an overridden are set they are said
to be loaded().

* Lintian::Tag::Info should then be moved to the output layer.

> Raphael Geissert writes:
>> Russ Allbery wrote:
>>> (Hm, I wonder it if would be worthwhile to have DepMap or PDepMap
>>> handle that internally -- take a type as well as a name and internally
>>> munge that into a unique identifier.)
> 
>> It would be PDepMap the one that would handle that.  My only, soft,
>> objection is that the idea behind PDepMap is to let the application
>> layer add whatever as a property of a node. Mostly to avoid extra data
>> structures that hold information about a given node.
> 
>> It should also be possible to make PDepMap take an, optional, reference
>> to a function that operates on the node properties and returns a node
>> name that should be used instead (in the 'sort' spirit).  The only
>> problem I see with this approach is that the application layer still
>> needs to know how to refer to a given node.
> 
> Yeah, the ideal would be for the application to refer to a node in all
> situations as a type/name pair, and have any other representation be
> strictly internal, but that may be more work than it's worth.

It actually sounds like a good idea. PDepMap could then only take a set of
properties only, never a node name.

> 
>>>> * Probably reconsider the name of Lintian::DepMap; after all, it
>>>> creates dependencies trees (the original name was based on the idea of
>>>> supporting more complex kinds of relationships which could make a
>>>> graph look more like a map than a tree).
> 
>>> I'm good either way.
> 
>> Do you have any suggestion for a new name?
> 
> How about Lintian::Order, since what it's doing is creating and
> manipulating a partial order?

Could be, yes. Will think about it.

> 
>>> Hm, yeah, we could do that, although it would make Lintian appear
>>> slower since it would have to hold all tags until the processing of the
>>> file completes.
> 
>> It could be added as an option.
> 
> Good point, and it would be fairly easy to implement as something passed
> into the Lintian::Tags layer.
> 

Using the architecture I described above, this could easily be done by
caching all data until the tags context is changed. At which point it would
go ahead and communicate with Lintian::Output.


What do you think?

P.S. have a happy new year :)

Cheers,
-- 
Raphael Geissert - Debian Developer
www.debian.org - get.debian.net

Reply to:

Follow-Ups:
- Re: RFC: further parallelisation (dependency-based collection and check scripts)
  - From: Russ Allbery <rra@debian.org>

References:
- Re: RFC: further parallelisation (dependency-based collection and check scripts)
  - From: Russ Allbery <rra@debian.org>
- Re: RFC: further parallelisation (dependency-based collection and check scripts)
  - From: Raphael Geissert <geissert@debian.org>
- Re: RFC: further parallelisation (dependency-based collection and check scripts)
  - From: Russ Allbery <rra@debian.org>

Prev by Date: Re: DRAFT: Bits from the Lintian maintainers
Next by Date: Re: DRAFT: Bits from the Lintian maintainers
Previous by thread: Re: RFC: further parallelisation (dependency-based collection and check scripts)
Next by thread: Re: RFC: further parallelisation (dependency-based collection and check scripts)
Index(es):
- Date
- Thread