Thoughts on Debian quality, including automated testing
Subject: Thoughts on Debian quality, including automated testing
[ I'm subscribed to -devel, no Cc required. I apologize for the
length, but it's only a bit over 3000 words. I hope the
section titles help, if you want to skip parts. ]
For some time now I have been thinking about ways to make Debian
better from a technical point of view. Most of my actual efforts
have gone into writing piuparts[1], running it against packages
in the archive, and reporting any problems. I have also spent some
time to think about related issues.
[1] http://packages.debian.org/unstable/devel/piuparts
This mail is primarily prompted by Ian Jackson's proposal[2] to
specify a framework for automated testing. I meant to write and send
it weeks ago, but for various reasons, I kept postponing finishing
it. Sorry about that. Part of the reason is that as I kept thinking
and writing about this, I kept expanding the scope. As a result, this
mail is quite long. Sorry about that, too. Further, I keep mentioning
piuparts in this mail. Sorry if it seems like I'm advertising it.
I'll start by saying that I fully support writing automated tests
for Debian packages. Automated tests can do very good things to
development of single programs. They can also do so to Debian
packages, and Debian as a whole.
[2] http://lists.debian.org/debian-project/2005/11/msg00073.html
and other threads
Before I get down to details, I'd like to be a bit philosophical
and preachy. You may want to skip a few paragraphs.
The quality of Debian is not bad at all. Debian works quite well for
a large number of people, and we get fairly few bug reports from
them relative to the number of programs we have packaged. That's
pretty much the only objective criterion we can currently use to
determine real quality.
Quality is sometimes hard to define. I claim that "package has
few bug reports in proportion to its user base" is one important
indicator of high quality.
Still, we could do much better. Our two best known quality assurance
tools, lintian and linda, are obviously not used by a lot of package
maintainers[4], given the number of packages that have problems.
Consider, for example, lintian's test for zero-byte files in the doc
directory[5]. There are a hundred packages that fail that test. Yet
the problem is really utterly simple to fix.
[4] http://lintian.debian.org/reports/tags.html
[5]
http://lintian.debian.org/reports/Tzero-byte-file-in-doc-directory.html
These zero-byte files are not a "real" problem: they use up an
inode, and make people spend a few seconds extra when looking
for information about the package, but they don't actually break
anything.
Yet the packages in question would be of higher quality if they
were non-empty or didn't exist at all. They may also indicate other
sloppiness, which may or may not be caught with automatic tools.
Sloppiness tends to result in real problems sooner or later.
I propose "respected automated tools find few problems" as the
second indicator of quality.
To improve the quality of Debian, we need to do several things:
A) Prevent bugs from happening in the first place
B) Find and report bugs
C) Fix bugs that have been reported
D) Prevent bugs from entering the archive
I will now discuss each of these things. After that I'll finally
get to discussing automated testing the way Ian Jackson proposed it.
A) Prevent bugs from happening in the first place
=================================================
In general, the way to prevent bugs from happening at all is by
reducing complexity. Simple things are easier to get right.
Most programmers find that using tools with higher abstraction
levels reduces complexity and the number of bugs they create for a
given task. As an example, writing the shell command "cat *.txt >
all.dat" much more likely to work correctly than writing the same
program in C, where you would have to open and read and write files
yourself, checking for errors, etc.
In a Debian packaging context, this might mean using packaging
helpers to take care of the boring, repetitive chores that are the
same from one package to the next. For example, debhelper is pretty
good at reducing the debian/rules file to just a handful of simple
invocations of the individual debhelper tool programs.
Each invocation is very simple. Very simple bits of code are usually
correct. Result: fewer bugs in packages and in Debian. Debhelper
is a very good thing indeed.
I'm not saying that using debhelper, or another packaging helper,
should be mandatory. They are merely one way of reducing the
probability of bugs, and because of that, I am happy that most
packages in Debian do use them. On the whole, our quality is better
thanks to that. If you can make bug free packages, by all means don't
use a helper (I don't, my packages are simple enough as they are).
There are other ways of combating complexity. Picking sensible
defaults and not making the package configurable via debconf is
simple. I'd add more examples, but I can't think of any right now
(and people on IRC are discussing dressing pork in yellow underwear,
which is highly distracting).
Where we can, we should avoid complexity and make things simpler. If
you have ideas for how to do this, please tell.
B) Find and report bugs
=======================
We currently have about 82 thousand bugs open (counting from the
BTS summary page on packages; this includes all severities and
tags). That's a lot of bugs. There are, however, about 11 thousand
packages with bugs, so there are only about 7.3 bugs open per
package, on average.
Our new bug numbers are in the 340 thousand range, so we've closed
258 thousand reported bugs (plus a lot of unreported ones) over
the years. That's a truly huge number of bugs.
A bug report is a wonderful thing. It means that there is no longer
any need to wonder whether there is something wrong in the package:
you know there is, and you know what it is. Someone, a nice person
using your package, has gone to the trouble of finding out what
the problem is, and also they decided to tell you. It is delightful
when people decide to be so helpful.
Sometimes it takes a long time for anyone to report a bug. It would
be more reassuring to be more proactive in finding bugs. Our two well
known tools for this are lintian and linda. They examine packages
for patterns that tend to indicate problems. On the whole, they are
very simple systems, but even so, they find a lot of problems. Most
problems probably never enter the Debian archive, because packagers
use the tools and fix anything that needs fixing before uploading.
The tool I wrote, piuparts, is similar: it should be used by the
package maintainer before uploading, so that a buggy package never
enters the archive. Piuparts is pretty new, so it's unsurprising
that most people don't use it yet.
C) Fix bugs that have been reported
===================================
Not all of the bugs the automatic tools find are fixed, however. And
there's those 82 thousand other bugs that need fixing as well. While
many of them are wish list bugs, or it is questionable if they are
bugs at all, they do all require some attention.
What can we do about that? Can we get down to no (fixable) bugs in
a stable release? (Fixable bug here means a bug that can be fixed at
all with reasonable effort by the Debian package maintainer. Having
to rewrite the X server doesn't count as reasonable effort. Wish
list bugs should probably be excluded as well.)
Having no bugs is a good state to be in. When a bug is reported,
it is usually easier to fix if there aren't a bunch of other bugs
disturbing the process.
Our Bug Squashing Parties are useful, but they mostly concentrate
on release critical bugs. Other bugs get less attention, but needs
to be fixed too. I'm guilty of ignoring many of the bugs against
my own packages for extended periods of time. People like me are
part of the problem.
Several ideas have been floating around for years on how to improve
this situation, of which I'd like to mention three. While I've here
used the number of bugs as the measure of a package's quality,
the same ideas might help with other aspects, like getting new
upstream versions packaged soon after they're released.
* Team maintenance. If a package is maintained by a team,
there are more people sharing the work. When a team works
well, more people look at the package, and finding and
fixing problems is more effective. There is less work per
person, so things don't lag as much. A well-working team
is a good thing.
As an example, the Debian GNOME team seems to work really
well. Transitions to the next upstream version happen
quite smoothly these days.
Mandatory teams for packages seems ridiculous to me.
Lots of packages are so small that having to arrange a
team for them, even if it is only the effort to set up
and subscribe to a team mailing list, is wasteful. Not
everyone likes to work in a close team, either, and we
shouldn't exclude them.
* Less strong ownership of packages. The current state in
Debian is that the package maintainer (or maintainer
team) owns the package, and as long as they don't cause
a lot of trouble, and don't have release critical bugs,
everyone else is invited to keep their hands off.
If the maintainer, for whatever reason, can't keep the
quality of the package up, it will have to degrade a lot
before anyone else dares to touch it. If this
Non-Maintainer Upload threshold were lowered, it might
be that quality could improve. There would probably
be a number of mistakes made, but that also happens
when people take on a new package.
This is not the same as maintenance by a team. An NMU
is done by someone interested in the package for
whatever reason, but they only do the upload to fix a
specific problem or problems, not to join the maintainer
team for a long time.
This idea hasn't been tested. It could be tested if
some group of maintainers declared that some or all
of their packages were part of the experiment, that
anyone could NMU them for any reason whatsoever, as
long as they take proper care not to mess the package
up.
(I'm willing to participate in such an experiment
myself, but I haven't thought out the details yet.)
* Abolishing package ownership completely. This is a more
radical version of the previous one. I'm not going to
argue for it until the milder form has been tested first.
The main theme here is the need for speed: bugs need to be closed
faster, and things that help this would be good.
D) Prevent bugs from entering the archive
=========================================
In program development, it is usually a good idea to not commit
anything until it passes all automatic tests. Similarly, I propose
that it would be good for Debian to use some of the automatic tools
before a package is accepted into the archive. For example, if
lintian finds the init.d-script-does-not-implement-required-option
error[6], is there any reason to accept the package, since it is
certainly buggy?
[6]
http://lintian.debian.org/reports/Tinit.d-script-does-not-implement-required-option.html
There are some practical issues with this, of course. Not all lintian
and linda warnings should prevent accepting a package, because
that might prevent fixing more important problems quickly. That's
fine-tuning, however, the general principle still applies: it's
better to prevent a buggy package from entering than fixing it later.
Some of the automatic checking might be too heavy, or too risky, or
otherwise impractical to run when processing incoming packages. In
these cases, it is better to accept the package into the archive
and then run tests later, and file bugs for any problems found.
Lintian has been run on all packages for many years. The results
are listed on a website[7], but many packages go for months, even
years without fixing the problems. When I started running piuparts
on all packages, I decided to report any problems as bugs, instead
of just publishing log files. This seems to work better: many of
the bugs have been fixed, some of them even in the same day.
[7] http://lintian.debian.org/
Automated testing of program functionality
==========================================
I'm speaking here about whether the programs in the package work,
not whether the packaging itself works. Lintian, linda, and piuparts
already test the packaging fairly well, I think. Also, I'm speaking
about active tests that require running programs in the package;
lintian and linda don't do that, and shouldn't.
In this section especially I'm partly rephrasing what Ian Jackson
and others said in the previous discussion (or what I think they
said), partly adding my own thoughts.
Having a way to automatically test that a package is at least
minimally functional is clearly a good thing. Speaking from the
point of view of someone who occasionally does NMUs to fix release
critical bugs in other people's packages, the easier it is to
check that a package still works after I've mucked about with it,
the easier things will be for everyone involved.
Automatic testing needs to happen in various contexts:
* When the package is being built. Most of such tests should go
into an upstream test module. Traditionally, this would
correspond to "make check".
* When the package has been built, but before it is uploaded.
This is similar to testing with lintian, linda, and piuparts.
The difference from build-time tests is that the tests are
run when the package is installed onto a system (possibly a
chroot or a virtual system).
* Before an uploaded package is accepted into the archive. This
would prevent buggy packages from entering the archive.
* On specifically crafted test systems. This would check that
packages still work even though other packages they depend
on have changed.
* On real systems, to verify that things still work. This would
potentially be a big help to system administrators.
Some issues:
* Test data. Some tests are going to require a large amount
of test data and that is best kept out of the binary package.
It is probably best to keep it in the source package only:
the test program then needs to install (and maybe partially
build) the source package.
* Test dependencies. Many tests will require using tools
that neither using nor building the package needs. Thus we
probably need "Tests-Depends" (for the source package).
* Generic tests. Since Debian has so many packages, as much
as possible should be tested using generic tests that apply
to many packages. For example, checking that an executable
can be run at all should be a generic test. Expecting ten
thousand source packages to add tests for that is unwarranted
optimism. Generic tests should require nothing from the
package itself.
* Specific tests. Obviously it is also necessary for each
package to be able to provide tests for its particular
peculiarities, such as instances of old bugs to avoid them
re-appearing. The interface for this should allow various
tools to be used for implementing the tests, so that there
is space for evolution of good tools. Compare the situation
with a raw debian/rules file and helper packages.
* Non-burdening of buildds. Especially slow architectures
might want to skip build-time tests to save time. There
should be a way to build the package without running tests,
and of course to run the tests only.
My concrete proposals:
* Let's write a tool that can do at least simple generic tests
(we'll expand it later).
* Let's standardize on a way to invoke package specific tests:
"debian/rules test-build" for build-time tests and
"debian/rules test-install" for tests of the installed
package. Neither must require the package to be built
already. The rules targets can only be assumed to exist if
debian/control contains a "Tests-Depends". Whoever calls
these must take care of installing test-dependencies.
* Let's modify pbuilder to run test-build tests and (if
possible) also the generic tool and test-install tests.
These belong, I think, better into pbuilder then piuparts,
but it might be that piuparts should run them also.
* Let's also write a tool that a sysadmin (or tester) can use to
run test-install for particular packages, or all installed
packages.
* Ian's proposed debian/tests/control interface sounds
nifty. I'm not going to debate the exact details (at least
here), they should probably be decided by the implementer
("those who, decide"). It should be implementable as a tool
that can be run from "debian/rules test-install", I think.
This will allow Debian packagers who like it to use it, but
those who prefer something else can use that instead. Some
people might want to use all available tools, even.
* After all this is done, let's start a campaign where every bug
fix includes a patch to add a regression test for it.
Let's take quality assurance seriously
======================================
Quality assurance is currently performed by a few people organized
around the debian-qa mailing list, and various other people
(including me). I see the need for a more aggressive, systematic
approach to quality assurance. This might be implemented by
(re-)forming the debian-qa team with a modified agenda, something
like this:
The task of the debian-qa team is to proactively find and fix
(technical) problems in Debian packages, and to temporarily
maintain orphaned packages.
Some of the things that it might make sense to organize better
include (if these already are organized well, I apologize):
* Reporting serious problems found by lintian/linda as bugs
against packages.
* Reporting problems found by piuparts. I already do this,
but it would be good to expand it to a couple more
architectures (at least), and to have more people process
logs of failed tests.
* Testing that all packages that are of optional or higher
priority actually can be installed at the same time. (This
should be partly doable by analyzing Contents files, if it
isn't already.)
* Testing that all packages can still be re-built even when
compilers, libraries, or other build dependencies have
changed.
* Checking that a system with as many packages as possible
installed can be upgraded from stable via testing to sid.
Then, of course, there is the fixing of bugs, but I've discussed
that above.
PS. Sorry again, my foot note numbering got confused.
--
On a clear disk, you seek forever.
Reply to: