Re: Lintian test suite proposal

To: niels@thykier.net
Cc: debian-lint-maint@lists.debian.org
Subject: Re: Lintian test suite proposal
From: Felix Lechner <felix.lechner@lease-up.com>
Date: Fri, 10 Aug 2018 14:53:28 -0700
Message-id: <[🔎] CAFHYt55s4tnrSBPWAyd4tfmNBx_zTtW1i0eW30qOya6eDMxe3A@mail.gmail.com>
In-reply-to: <[🔎] aa07c5a1-ba23-f930-ad37-6d26f6e5d1d5@thykier.net>
References: <[🔎] aa07c5a1-ba23-f930-ad37-6d26f6e5d1d5@thykier.net>
Hi Niels,

Thank you for your thoughtful comments. Your input weighs heavily in
Debian QA. I will address your points one by one, but will mostly
disregard your comments regarding style. Some of it may be my lack of
experience. Perhaps it is my walk of life. I probably dislike the
passive language that is so common in technical documents. Either way,
I am not tied to any wording. Let's begin:

> As I read it, there are two orthogonal but interesting ideas.
>
>  1. The tags.d layout
>
>  2. The Abbreviated-Tags proposal pigging backed on top of tags.d.

The two ideas are not as unrelated as they seem. Both (1) and (2)
dissociate the tag names from the test context. Keeping that
information together as we currently do is not only redundant but also
makes it hard to refactor and reorganize our tests. Given your
positive reaction to 'Abbreviated-Tags' I won't dwell on its
advantages, but it is the less powerful idea of the two. Sure, it's
nice you don't have to mention the test name in 'tags' any more, but
you still have to deal with nuisance tags from unrelated checks. Many
times, they are a pointless obstacle: Who cares about a watch file,
for example, when you are testing lintian's reaction to changelog
entries? I think ultimately the option 'Abbreviated-Tags' only makes
it easier to rename tests. It does not help with unit testing the
lintian checks, which is the core proposal.

The tags directory proposal suppresses tag noise from other checks. I
also makes it easy to reuse tag files in various test contexts as long
as you are testing the same basic features. Another great advantage,
and a major motivation, is the easier reordering of git commits
containing newly proposed Lintian tags. If any test validates more
than one check you are working on, you can spend a lot of time with
'git rebase -i' and 'git add -i' trying to swap the commit order. With
tags.d it is almost guaranteed to be conflict-free.

You may want to add a third point. My proposal also abolishes the tag order.

As a side note, I actually expected greater opposition to the bundling
of all tags in one check as a testable entity, as opposed to viewing
single tags as the proper entity under test.

The idea for tags.d arose when I read the following portions of the
README (emphasis marked with >>> and <<<):

/ Please keep each test case focused.  One of the problems that
/ developed with the old test suite is that each test was serving many
/ separate purposes and testing large swaths of Lintian, which made it
 / difficult to know what could be changed and what would destroy some
/ other useful test.  >>> Test cases should only test a set of closely
/ related tags and new tests should be added for new issues that aren't
/ part of that closely-related set. <<<

/ Test cases should be as Lintian-clean as possible except for the tags
/ that they're testing for.  The template is intended to help with this.
/ It generates a Lintian-clean basic package for you to start with.  You
/ should override only the minimal required to trigger your test, and
/ try to fix any unrelated problems. >>> Sometimes this won't be possible
/ and the only way to trigger a tag is to also trigger another tag, and
/ that's fine, but it shouldn't be the normal case. <<<


> As I read it, these proposals should be feasible implement separately
> (i.e. Abbreviated-Tags could be retrofitted into the existing framework
> without adding tags.d).  This is mostly relevant if one of the proposals
> turn out to be controversial and the other is not - in that case, we can
> move forward with the uncontroversial one.

Good point. I originally implemented them separately but found the
combination more powerful. Please note that either way the use of
directory tags is optional. People can still use the regular 'tags'
file.

> Personally, I feel I understand Abbreviated-Tags good enough that I can
> say I would support it and I would recommend it was made the default for
> new tests (modulo a few minor remarks).
>   On the tags.d layout, I can mostly see where you would like to go but
> it is not clear to me when the test runner will accept vs. fail a test.

The test runner fails if a check being tested did not issue all tags
specified, or if the check issued tags that weren't specified. That
behavior is similar to what happens now, except everything happens on
a per-check basis.

> > New tags.d layout
> > -----------------
>
> General remarks about the text (assuming it is for t/tests/README):
>
>  * I think we should avoid the "New" / "Old" in this document if it is
>    an addition to t/tests/README.  The problem with using "new" for
>    everything is that in a year it will no longer be new and at some
>    point it could even be superseded by a different format/layout as we
>    improve ourselves.

Let's cooperate on wording if you are available to give advice, please.

>  * The text sometimes stops being a README document and starts being
>    sales-pitch and then jumps back to being a README a bit later only
>    to become a TODO list in the end.  I have highlighted a few of these
>    but I stopped when I realised it was a general problem.
>    - Note there is nothing wrong with the sales-pitch/rationale for the
>      proposals.  The "issue" is that I was told to read this as a
>      t/tests/README change and some of the text need tweaking to fit
>      there.

You are right that the document is a pitch, but there won't be any sales. :)

> >> Each file is named after a lintian check. It contains all tags issued
> > by a particular check. The tags can be in any order. Tags from checks
> > are ignored and will no longer interfere.
>
> The last sentence is a bit unclear to me.  I suspect you mean that tags
> are ignored if they do not belong to one of the checks in tags.d?

Yes. Sorry, the text was supposed to read "Tags from other checks are ignored".

> Assuming there is tags.d/X file and it contains the tag "Y".  Will the
> test fail if the check X emits the tag "Z" in addition to "Y"?

Yes, it will, if tags.d/X contains only Y but not Z.

> > This decoupling has several
> > advantages. For example, one can now enable '--pedantic' without
> > having to deal with any noise from other checks. Instead, one can
> > conveniently test just the tags expected from a particular check.
> >
> > The tags.d directory also contains a file named 'options' which gives
> > the user a way to restore the old behavior.
>
> How come you are going for a new "options" file for these options
> instead of adding them to the per-test desc file?

The options are only needed when the directory tags.d/ exists. I can
imagine many more options related solely to tag format, so a separate
file seemed sensible. I am fine either way.

> > When set, 'Exhaustive-Set:
> > yes" will cause any extra tags in the output to fail the test. Since
> > checks generally have little correlation, this option will not be
> > useful in many cases. For most tests, the default setting of
> > 'Exhaustive-Set: no' will be a great relief.
>
> I could do with some examples of when the test succeeds and fails given
> a given tags.d folder/content, different Exhaustive-Set values and
> different lintian output.

Please take a look at an abstract example for now:

Check A issues tags from set TA. The test triggers TA+ but not TA-.
Check B issues tags from set TB. The test triggers TB+ but not TB-.
The test triggers no other tags.
The tags.d directory contains only A.tags (which corresponds to TA+)

With Exhaustive-Set: no, the test passes. If check A issues additional
tags, the test fails. This effectively unit-tests check A.

With Exhaustive-Set: yes, the test fails. If you specify tags.d/B.tags
(which corresponds to TB+), the test passes. It effectively tests the
for the combined set (TA+, TB+) and against (TA-, TB-). The last bit
means neither check may issue tags not specified.

A good mnemonic is "the files in tags.d are an Exhaustive-Set," if
that works for you.

> > The lintian option
> > '--pedantic' might even become the default setting for the test suite
> > going forward.
> >
>
> The last sentence (about making --pedantic default in tests) makes sense
> in the proposal as it is part of the reason why this proposal might be
> useful.  However, I do not think it would belong in t/tests/README.

When pedantic is enabled, the sentence won't be needed anymore anyway. :)

> > For more flexibility, there is another option 'Abbreviated-Tags: yes'.
> > It allows expected tags to be listed in a shorter and possibly
> > more convenient format. First, the letter code at the beginning of
> > each line is dropped and reconstructed automatically. Tags are only
> > ever issued with one of these letters, so the chance of an internal
> > error is small. Instead of worrying about a tag's severity and
> > certainty, you can focus on which tags are being issued and why.
> >
>
> Note that AFAICT, the Abbreviated-Tags could be added to the existing
> framework without the new layout (after resolving the feature
> interaction with e.g. "Sort: yes").

The letter code is easy to reconstruct. Unfortunately, removing the
package name does not work for packages generating multiple binary
packages. Right now, my favorite approach is tags.d/aliases, which is
not yet part of this proposal.

> > An example for the abbreviated format is:
> >
> > - File: tags.d/changelog-file.tags
> >
> > : epoch-changed-but-upstream-version-did-not-go-backwards 3 >= 0.0.1
> > : no-upstream-changelog
> > : epoch-change-without-comment (none) -> 2
> >
> > Please note the colons at the beginning of each line. They are necessary
> > to distinguish plain lintian tags from other suites, as in:
> >
> > - File: tags.d/watch-file.tags
> >
> > source: debian-watch-does-not-check-gpg-signature
> >
>
> Would it make sense to also let the test runner reconstruct the
> "source:"-bit (etc.) such that the abbreviated format will be:
>
> """
> <tag1>
> <tag2> <extra>
> <tag3>
> """
>
> At least, it seems more natural to me that if the test runner is going
> to assist us, it might as well assist us with all the "boring" parts of
> the output.

I was under the impression that some tags were issued for more than
one test suite, such as for 'debs' and for 'source'. Can the test
suite be determined from the remaining tag line? We may have a
solution if some tags are used in different suites but only appear
together.

> > The changes in the new format work together to dissociate checks, tags
> > and tests. One goal is to make everything easier to refactor in the
> > future.
> >
> > There are many other benefits, as well: With the new format, we
> > automatically confirm for each check that no tags other than those
> > specified were issued. That obsoletes the option Test-Against.
> >
>
> Test-Against is an interesting relic.  Test-Against has 3 purposes:
>
>  1) What you describe, where it is now completely and utterly obsolete
>     because the test would fail before Test-Against is validated.
>     (unless you consider it a "stop gap" to avoid people blindly shoving
>      a false-positive regression in /tags)

Interesting point. My implementation does not offer this 'stop-gap'.

>  2) For calculating t/COVERAGE which you mention later

This is actually not yet automated, but my conversion script looks at
very similar information for each test. I think it is easily
aggregated.

>  3) For selecting all tests that are relevant to a given tag.
>
> I believe the proposal covers 1+2.  But I do not see 3 mentioned in the
> proposal.  I am fine with dropping Test-Against if item 3 is no longer a
> useful feature but I figured we ought to be explicit about it.

I think a test is only relevant to a tag if it issues the tag or
doesn't. We just don't test tags anymore. We test checks. You can have
the same, and in fact more comprehensive, outcome by creating an empty
file for the check. It effectively adds all tags belonging to that
check to Test-Against. Not trying to sound pitchy, but it is also
automatic.

> > When a check is expected to run without issuing tags, please use a
> > file with one or more empty lines. (Git won't commit files that are
> > totally empty.)
>
> Note: We have plenty of completely empty files committed in git; what
> git refuses to track are empty directories.

Sorry, I misspoke. Empty files are fine.

> > Since we now know the names of all checks in a test,
> > this also obsoletes the option Test-For.
> >
> True, but this could also be retrofitted to the current solution with
> Test-For being optional when it is 1:1 with the tags file.  That would
> remove the vast majority of Test-For entries.

Yes, you are right.

> Also, I would recommend avoid using first person ("we" or "I") in the
> README.  It is fine for an informal description or a proposal.

With my language I hoped to engage you, although not on that point.

> > Neither Test-For or Test-Against is needed going forward. They are two
> > fewer things to worry about. Both fields were previously used to
> > calculate the COVERAGE. For tests in the new format, the contribution
> > to coverage is calculated from other data. With fewer opportunities
> > for human error (for example by forgetting to update Test-For), the
> > COVERAGE information will be at least as accurate as before. The
> > coverage may also be wider due to the automatic Test-Against, as
> > explained above. In any event, the COVERAGE information continues to
> > be useful and up-to-date.
> >
>
> I like this aspect of the proposal aiming to reduce redundant
> information.  :)
>
> > Now you may wonder, does it have any drawbacks? The answer is
> > yes.
>
> A bit to informal / speech-like for the README.
>
> > Since unexpected tags are generally ignored, it can be hard to
> > tell what a test could also be good for. It may be best practice to
> > use only narrow tests, but you can get additional recommendations for
> > wider unit testing by running 'runtests' in verbose mode. It will tell
> > you which other checks emit tags, so you can add them if you like.
> >
> > As a special rule, 'runtests' will expect absolutely zero lintian
> > output when no tag files are present. In that case, the option
> > 'Exhaustive-Set' has no effect. This eliminates the absurd and
> > unwanted possibility that no testing takes place. The resulting error
> > messages also make it easier to write unit tests for particular
> > features, one test at a time.
> >
> > The features described presently only work with regular lintian
> > output. For XML or colon-separated output, please continue to use the
> > regular, uniform 'tags' files.
> >
> > There are few other changes: The older format used a script
> > 'test_calibration' to adjust expected tags for corner cases (mainly
> > for release architectures). Such scripts now operate on a per-check
> > basis. They have names like tags.d/${check}.calibrate. Please note
> > that the input and output formats are the same as for the expected
> > tags. If you use abbreviated tags in your test, please make sure your
> > calibration script issues them, too. The scripts get only a filtered
> > lintian output that relates to their particular check.
> >
> > A few tests examine lintian's debug or non-tag output. Those lines
> > are now found in tags.d/messages. You can use a calibration script
> > with them, but there is no abbreviated format.
> >
>
> How is lintian's message output matched with tags.d/messages?  Are both
> sorted first or ...?

Each line is found when an identical line in the output matches. That
line is then removed from the output before further matches are made.

> > Since the expected tags can now be listed in any order, the Sort
> > option is probably obsolete, unless a calibration script depends on
> > it.
> >
>
> I think the Abbreviated-Tags proposal will overshadow the Sort option
> for tags.  As such, Sort will generally only be useful for
> "Abbreviated-Tags: no" tests (but I would not be surprised if most of
> them were also "Sort: no" tests):

Since expected tags can be listed in any order, I cannot imagine any
use for Sort when using tags.d. This true for all tag formats.

Imposing a tag order tests that lintian checks were executed in a
particular order. Upon reflection, I did not find it meaningful.

> I would not keep Sort around for calibration scripts; it is trivial to
> invoke sort in such a script.
>
> > The majority of tests were converted automatically. Tags previously
> > listed in one large 'tags' file are now separated into several smaller
> > files that group tags emitted by the same check. For tags found in
> > Test-Against, we made sure the check they belong is represented in the
> > tags.d directory. Sometimes, that meant adding a blank file. (For all
> > the automation, it should be noted that all tags that were listed in
> > Test-For were being tested; Test-Against cannot be similarly
> > validated.) Some tests used special features that will require more
> > work. They were not converted.
> >
> > Going forward, there are many things you can do to make our test suite
> > better. First of all, it is always a good idea to write more
> > tests. While we already have many, one can rarely have enough. (In the
> > future, we may group them somehow to make them easier to navigate.)
> > For converted tests, please embrace the unit testing paradigm and help
> > with the following:
> >
> > 1. Remove 'Exhaustive-Set: yes' from tags.d/options. (The default is
> > no.)
> >
> > 2. Delete any tag files similar to tags.d/{$check}.tags that have only
> > nuisance tags. Those are tags that appeared from unrelated checks you
> > were not actually interested in testing. (At some point, lintian may
> > also speed things up by skipping checks not present.) Please remove
> > any file your test does not mean to validate. If two tag files appear
> > useful separately, please consider duplicating the test, and place one
> > file with each.
> >
> > 3. If you have the patience to add missing tags, please enable
> > '--pedantic'.
> >
> > Now we have unit testing! Thank you!
>
> Once again, thanks for working on improving lintian.  As I mentioned, I
> am happy to support the Abbreviated-Tags feature with only a minor
> remark on the syntax (possibly retrofitted into the existing layout as
> well).
>   For tags.d, I need some more examples to understand how the parts
> interact.

Thank you for your time in reviewing the proposal. I appreciate your
kind words and your support (partial, at least). I will try to make a
stronger case for the tags.d scheme going forward. Thank you, and have
a good weekend!

Best regards,
Felix

> Thanks,
> ~Niels
Reply to:
References:
- Lintian test suite proposal
  - From: Niels Thykier <niels@thykier.net>
Prev by Date: Processed: your mail
Next by Date: Bug#905881: lintian: detect packages containing X11 fonts that do not run update-fonts-* or do not dep on xfonts-utils
Previous by thread: Lintian test suite proposal
Next by thread: Bug#905843: lintian recognized "allow-stderr" as a non-standard autopkgtest restriction
Index(es):
- Date
- Thread