Bug#740607: lintian: Please support build-profiles

To: Johannes Schauer <j.schauer@email.de>
Cc: 740607@bugs.debian.org
Subject: Bug#740607: lintian: Please support build-profiles
From: Niels Thykier <niels@thykier.net>
Date: Tue, 04 Mar 2014 22:51:12 +0100
Message-id: <[🔎] 53164AD0.9020301@thykier.net>
Reply-to: Niels Thykier <niels@thykier.net>, 740607@bugs.debian.org
In-reply-to: <[🔎] 20140304093437.21951.24587@hoothoot>
References: <[🔎] 20140303150916.16478.36959.reportbug@hoothoot> <[🔎] 5314EE9B.6040801@thykier.net> <[🔎] 20140304093437.21951.24587@hoothoot>

On 2014-03-04 10:34, Johannes Schauer wrote:
> Hi,
> 
> Quoting Niels Thykier (2014-03-03 22:05:31)
>> Thanks for looking into this.
> 
> and thanks for responding so quickly :)
> 

If only I could keep it up :)

> [...]
>> Usually, we don't bother doing "conflict" testing.  At least, we skip it
>> for regular debhelper dependencies and many other cases.
> 
> I noticed. Does that mean that you want me to remove the conflict checking?
> Since it's already done now it does not create more work for me to leave it
> there. Maybe you want me to combine dependency and conflict testing into one
> single tag each?
> 

Personally, I don't care much either way right now.  Lets make it a "for
future reference"-note.

>>> diff --git a/checks/fields.pm b/checks/fields.pm
>>> index aeee252..2cae275 100644
>>> --- a/checks/fields.pm
>>> +++ b/checks/fields.pm
>>> @@ -730,7 +730,7 @@ sub run {
>>>                      && $known_obsolete_emacs{$alternatives[0]->[0]});
>>>  
>>>                  for my $part_d (@alternatives) {
>>> -                    my ($d_pkg, $d_march, $d_version, $d_arch, $rest,
>>> +                    my ($d_pkg, $d_march, $d_version, $d_arch, $d_restr, $rest,
>>>                          $part_d_orig)
>>>                        = @$part_d;
>>>  
>>
>> If $d_restr is not used in the scope, please consider replacing it with
>> "undef" to signal that.  (The diff suggests it is unused).
> 
> Correct, just as architecture restrictions, restriction lists are not found in
> binary package stanzas. I usually copy the implementation of architecture
> restrictions when I prepare patches for restriction lists. I set $d_restr to
> undef but note, that then (for consistency) $d_arch should be set to undef too
> because it is also not used.
> 

Thanks for confirming my assertion.  Sadly, I see I phrased myself
poorly.  You implemented what I said, but not what I wanted.  What I
meant was to remove the assignment to $d_restr completely.  I.e.

  my ($d_pkg, $d_march, $d_version, $d_arch, undef, $rest,
      $part_d_orig)
    = @$part_d;

It is a little Perl feature (similar to the use of "_" in some high
level programming languages to signal "I don't care about this
argument/value").  It has the advantage of ensuring that $d_restr is not
declared in the scope and make any use of it a fatal error (as a
compile-time check \o/).

>>> +                        for my $restr (@{$d_restr})
>>> { +                            my $dotcount = () = $restr =~ /\./g;
>>
>> You probably want:
>>   my $dotcount = $restr =~ tr/.//;
> 
> Is there a difference between the former and the latter? I replaced it by the
> latter as you suggested as it seems to do the same.
> 

The approaches differ in how they are done.  Neither are "truly" optimal
in my eyes, but in the given case both will do.

On the implementation front, "$x =~ tr/.//" means "replace all periods
in $x with periods". But as a side-effect, tr/// returns the number of
"changes" it made.  In this case, it effectively counts the number of
periods in $x.
  When I said this was "not optimal in my eyes", it is because in
Perl5.20 we are getting "COW" (i.e. Copy-On-Write) support.  I fear this
will cause $x to be copied (undermining the advantages of COW).  But
then, the string is possibly too small to be COW'ed in the first place
(or tr could be smart enough to not change the string in this case).

For the "while (m//)" one, its a RO operation.  However, the while +
regex operation is more likely more expensive than just copying the
string, making the change and throwing it away again... sadly.  But I
digress.

> [...]
>> Secondly, I got a feeling that we want to move this to a data file (if
>> not now, then eventually).
> 
> You probably mean somewhere in ./data/fields? But that would mean one file per
> namespace, right?
> 

Not necessarily.  We can have "multi-level" data files.  I suspect that
commit 6bca8f4830f95e27657c70d45b7ec6003dbe3673 could be an example to
get you started.  Alternatively, look at $MENU_SECTIONS in
checks/menu-format.pm

I am thinking something simple like:

data/fields/known-build-profiles:
  """
  profile.stage1
  profile.notest
  ...
  """

and a:

sub _parse_build_profiles_data{
    my ($key, $val, $pval) = @_;
    [... see commit 6bca8f4 for inspiration here ...]
}

my $KNOWN_BUILD_PROFILES = Lintian::Data->new(
  'fields/known-build-profiles', qr/\./,
  \&_parse_build_profiles_data);

>> As mentioned earlier; we tend not to do the "conflict" tests.  Feel free
>> to skip them entirely.
> 
> Maybe there should be a function similar to implies() which takes depends as
> well as conflicts into account?
> 

Problem being that a Lintian::Relation tend to only work on a single or
two fields with the same (basic) "meaning".  It has no understanding of
Dependencies vs. Conflicts/Breaks.

>> [...]
> 
> Should I extend new_noarch to also strip profiles? In that case it should
> probably also be renamed to match the new functionality? It seems that in all
> cases where new_noarch is used right now it would also make sense to strip the
> restriction list. So maybe new_noarch should become new_noarch_norestr or
> new_norestriction?
> 

I suppose it would make sense to rename it, but I am not quite convinced
it is worth the hassle.  If anything, add the new constructor and make
new_noarch an alias of that.  I believe the syntax is something like:
  *new_noarch = \&new_norestriction

(see Lintian::Util where we do it with open_gz).

>>> [...]
>>
>> O(n^2) checking? Ugh.  I suppose we could "normalise" the order before
>> the comparison for slightly more accuracy, but still... ugh.
>>
>> Actually, if we (for a moment) ignore negative profiles (e.g. imagine a
>> world without "!profile.stage1"), wouldn't this just be checking that
>> the "larger" of the two is a subset of the other (assuming both are
>> defined)?
> 
> How would this assumption help us?
> 
> Do you mean because in the special case that all restrictions are positive or
> negative, the order of the restriction list does not matter anymore?
> 
> Yes, if a restriction list is all positive and all negative then order does not
> matter and set comparisons as done with architectures are possible.
> 

I believe that we are looking at "implies_element", which is comparing
one single "literal" (atom?) from one relation with another "literal"
from another relation.  The implication is only computed for atoms and
based on that, we either move on to the next atom or reject the
implication (I think that gives an O(n*m) comparison, assuming O(1) for
atom-implication tests).

So, given a comparison between two literals/atoms, can we do this test
faster than "O(n^2)"?  From what I can tell, we ought to as long as all
restrictions are either negative or positive.  In fact, I suspect it is
trivially as long as just one of the literals have an "all positive" or
"all negative" restriction list.

For an all positive / negative list, we should be able to do it in
O(n+m) [assuming no/few hash collisions] per [1].

http://my.safaribooksonline.com/book/programming/perl/1565922433/arrays/ch04-13496

>> Can you give me an example of where this O(n^2) kicks in?
> 
> Maybe I'm misunderstanding how that function is supposed to work. Whether or
> not the restriction list influences dependency resolution currently depends
> which profiles (if any) are currently activated. Lintian can't know which
> profiles are activated at build time so (unless there exists a more clever
> solution) it must check for all possible combinations of activated or
> deactivated build profiles to be able to say something about dependency
> implications.
> 

We are generally working in "pure logic" (I think there is a hack or two
somewhere, for making testing for duplicate clauses work, but that is
the only "violation" of "pure logic" and I believe it has extensive
explanation in the method).

Its been a while since I worked with these terms, so I might be
misapplying the concept.  Anyhow, I believe we can describe this like a
complete lattice (or "partially ordered set").  E.g.

  A (>= 1.0) implies A, but A does not imply A (>= 1.0)
  A implies A <profile.stage1>, but not vice versa.

With the restriction set being "another" complete lattice.

  <profile.stage1> implies <profile.stage1 profile.notest>
   (I assumed profiles were "OR'ed" like archs. If "AND'ed", then swap
   the parts around the "implies".  If neither, then please apply
   liberal cluebatting)

Anyway, I appear to have misread "O(n^2)" as "O(2^n)".  I find "O(n^2)"
vastly more reasonably than "O(2^n)" - although O(n) or O(n+m) is
obvious to be preferred.  If we are indeed talking O(n^2) = O(n*n) i.e.
polynomial time, then the comment in your patch might need rewording:

"""
+    # With n being the number of possible build profiles, 2^n checks would
+    # have to be done. We decide not to do that (yet).
"""

(this is where I got the O(2^n) from).

> The O(n^2) kicks in because of this:
> 
> Imagine that the two dependencies you want to compare only use one single
> profile (lets say stage1), then the only two possibilities are
> 
>  - no profile                   # column 2
>  - one profile (stage1)         # column 3
> 
> [...]
> 
> And we see pascal's triangle. The sum of the N-th row of pascal's triangle is
> N^2 which corresponds to the last column above.
> 

I am not quite sure I could explain the argument with my own words (i.e.
I am not sure I quite get it).  But if we are talking O(n^2), I'd would
have taken it for granted if not for:

> Though now that I look at the logic, there might be a simple way to test for
> implication without checking N^2 possibilities. I'll look into it.
> 

Sounds good! \o/  Anyhow, this part can certainly come in a later patch.

>> Please add the new tags to the "Test-For" field in
>>    t/tests/fields-build-depends-general/desc
>> (Please keep the field sorted by name of the tags)
> 
> Ah yes, I saw that file but forgot to make changes. Fixed now.
> 

Thanks

>> If you have done it right, you should be able to run
>> "private/update-coverage" and t/COVERAGE will *not* list any of your new
>> tags.  Feel free to omit the changes to t/COVERAGE though.
> 
> ooh, t/COVERAGE is a handy thing!
> 
> Thanks for helping me with this!
> 
> cheers, josch
> 

You are welcome and thanks for helping with implementing it. :)

~Niels

Reply to:

Follow-Ups:
- Bug#740607: lintian: Please support build-profiles
  - From: Johannes Schauer <j.schauer@email.de>

References:
- Bug#740607: lintian: Please support build-profiles
  - From: Johannes Schauer <j.schauer@email.de>
- Bug#740607: lintian: Please support build-profiles
  - From: Niels Thykier <niels@thykier.net>
- Bug#740607: lintian: Please support build-profiles
  - From: Johannes Schauer <j.schauer@email.de>

Prev by Date: Bug#740607: lintian: Please support build-profiles
Next by Date: Bug#739671: phppear check randomly checks one binary
Previous by thread: Bug#740607: lintian: Please support build-profiles
Next by thread: Bug#740607: lintian: Please support build-profiles
Index(es):
- Date
- Thread