Bug#851749: autopkgtest: machine-readable sub-tests within an autopkgtest-level test
autopkgtest currently has one level of hierarchy: a test is either an
executable script in debian/tests/ named in debian/tests/control, or a
command in debian/tests/control.
There is often a finer-grained result than that available. Because
debian/tests/control is a static file in the source package, many source
packages (including those that use the pkg-perl-autopkgtest, and those
that use the GNOME installed-tests convention) have a single autopkgtest
that encapsulates multiple upstream tests. For example, see src:flatpak
(GNOME-style) and src:ikiwiki (Perl-style). There is interest in
reporting the results of thise upstream tests individually.
Ian Jackson writes:
> autopkgtest can report individual test failures without "failing the
> whole test suite".
> There is new functionality needed to be able to do this in cases where
> there are many test results run by one upstream script.
> You should help enhance autopkgtest so that a single test script can
> report results of multiple test. This will involve some new protocol
> for those test scripts.
Finer-grained than even that, many test frameworks report individual
assertions within an upstream test. GNOME and Perl both conventionally
do this via TAP <http://testanything.org/>, which has producers and
consumers in multiple languages.
I would like to propose TAP as autopkgtest's protocol for finer-grained
test result reporting, something like this:
* Tests in debian/tests/ may declare "Features: TAP". If they do, their
stdout is expected to be TAP, and the TAP results ("ok" and "not ok"
lines) are treated as sub-tests of the autopkgtest. If they do not,
their stdout is assumed to be unstructured. stderr is always
* Optionally, a TAP test may output sub-tests in the syntax produced by
ok 1 - first test
# the detailed output of the sub-test comes *first* so that
# we can stream incomplete output
ok 1 - first part of second test
ok 2 - second part of second test
ok 2 - overall result of second test
ok 3 - third test
(This notation is non-standard but widely supported, for example in
Perl Test::More, node.js node-tap, and the Jenkins TAP consumer.
TAP consumers that do not support it will typically ignore it.
I'm deliberately ignoring the bikeshedding about alternatives on
https://github.com/TestAnything/Specification/issues/2 because the
protocol that Test::More has supported since at least 2009 is
a perfectly reasonable one.)
* A failing TAP autopkgtest must still exit nonzero or write to stderr;
it is not correct for it to write "not ok" or "Bail out!"
and subsequently exit 0. In practice most TAP producers seem to
do this correctly, including Perl and GLib.
* Optionally, we could permit exiting 0 and relying on TAP
parsing if it declares "Restrictions: TAP" (which would be
short for "requires TAP parsing for correctness").
* Optionally, the autopkgtest runner could have a mode to output
TAP itself. It would have to indent TAP tests' output by 4 spaces
to make them into sub-tests, and escape non-TAP tests' output
(by either prepending "#" or writing it to autopkgtest's stderr)
to avoid it invalidating the structured syntax on stdout.
Separately but somewhat relatedly, I've proposed patches for
gnome-desktop-testing (GNOME's test-runner, as used by src:flatpak for
autopkgtests) to make it output TAP; currently it has unstructured output,
and the individual tests that it runs are usually TAP. This could give us
a large number of tests with structured output relatively quickly.