[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Thoughts on Debian quality, including automated testing

Subject: Thoughts on Debian quality, including automated testing

[ I'm subscribed to -devel, no Cc required. I apologize for the
  length, but it's only a bit over 3000 words. I hope the
  section titles help, if you want to skip parts. ]

For some time now I have been thinking about ways to make Debian
better from a technical point of view. Most of my actual efforts
have gone into writing piuparts[1], running it against packages
in the archive, and reporting any problems. I have also spent some
time to think about related issues.

    [1] http://packages.debian.org/unstable/devel/piuparts

This mail is primarily prompted by Ian Jackson's proposal[2] to
specify a framework for automated testing. I meant to write and send
it weeks ago, but for various reasons, I kept postponing finishing
it. Sorry about that. Part of the reason is that as I kept thinking
and writing about this, I kept expanding the scope. As a result, this
mail is quite long. Sorry about that, too. Further, I keep mentioning
piuparts in this mail. Sorry if it seems like I'm advertising it.

I'll start by saying that I fully support writing automated tests
for Debian packages. Automated tests can do very good things to
development of single programs. They can also do so to Debian
packages, and Debian as a whole.

    [2] http://lists.debian.org/debian-project/2005/11/msg00073.html
        and other threads

Before I get down to details, I'd like to be a bit philosophical
and preachy. You may want to skip a few paragraphs.

The quality of Debian is not bad at all. Debian works quite well for
a large number of people, and we get fairly few bug reports from
them relative to the number of programs we have packaged. That's
pretty much the only objective criterion we can currently use to
determine real quality.

Quality is sometimes hard to define. I claim that "package has
few bug reports in proportion to its user base" is one important
indicator of high quality.

Still, we could do much better. Our two best known quality assurance
tools, lintian and linda, are obviously not used by a lot of package
maintainers[4], given the number of packages that have problems.
Consider, for example, lintian's test for zero-byte files in the doc
directory[5]. There are a hundred packages that fail that test. Yet
the problem is really utterly simple to fix.

    [4] http://lintian.debian.org/reports/tags.html 

These zero-byte files are not a "real" problem: they use up an
inode, and make people spend a few seconds extra when looking
for information about the package, but they don't actually break

Yet the packages in question would be of higher quality if they
were non-empty or didn't exist at all. They may also indicate other
sloppiness, which may or may not be caught with automatic tools.
Sloppiness tends to result in real problems sooner or later.

I propose "respected automated tools find few problems" as the
second indicator of quality.

To improve the quality of Debian, we need to do several things:

    A) Prevent bugs from happening in the first place 
    B) Find and report bugs 
    C) Fix bugs that have been reported 
    D) Prevent bugs from entering the archive

I will now discuss each of these things. After that I'll finally
get to discussing automated testing the way Ian Jackson proposed it.

    A) Prevent bugs from happening in the first place

In general, the way to prevent bugs from happening at all is by
reducing complexity. Simple things are easier to get right.

Most programmers find that using tools with higher abstraction
levels reduces complexity and the number of bugs they create for a
given task.  As an example, writing the shell command "cat *.txt >
all.dat" much more likely to work correctly than writing the same
program in C, where you would have to open and read and write files
yourself, checking for errors, etc.

In a Debian packaging context, this might mean using packaging
helpers to take care of the boring, repetitive chores that are the
same from one package to the next. For example, debhelper is pretty
good at reducing the debian/rules file to just a handful of simple
invocations of the individual debhelper tool programs.

Each invocation is very simple. Very simple bits of code are usually
correct. Result: fewer bugs in packages and in Debian. Debhelper
is a very good thing indeed.

I'm not saying that using debhelper, or another packaging helper,
should be mandatory. They are merely one way of reducing the
probability of bugs, and because of that, I am happy that most
packages in Debian do use them. On the whole, our quality is better
thanks to that. If you can make bug free packages, by all means don't
use a helper (I don't, my packages are simple enough as they are).

There are other ways of combating complexity. Picking sensible
defaults and not making the package configurable via debconf is
simple. I'd add more examples, but I can't think of any right now
(and people on IRC are discussing dressing pork in yellow underwear,
which is highly distracting).

Where we can, we should avoid complexity and make things simpler. If
you have ideas for how to do this, please tell.

    B) Find and report bugs 

We currently have about 82 thousand bugs open (counting from the
BTS summary page on packages; this includes all severities and
tags). That's a lot of bugs. There are, however, about 11 thousand
packages with bugs, so there are only about 7.3 bugs open per
package, on average.

Our new bug numbers are in the 340 thousand range, so we've closed
258 thousand reported bugs (plus a lot of unreported ones) over
the years.  That's a truly huge number of bugs.

A bug report is a wonderful thing. It means that there is no longer
any need to wonder whether there is something wrong in the package:
you know there is, and you know what it is. Someone, a nice person
using your package, has gone to the trouble of finding out what
the problem is, and also they decided to tell you. It is delightful
when people decide to be so helpful.

Sometimes it takes a long time for anyone to report a bug. It would
be more reassuring to be more proactive in finding bugs. Our two well
known tools for this are lintian and linda. They examine packages
for patterns that tend to indicate problems. On the whole, they are
very simple systems, but even so, they find a lot of problems. Most
problems probably never enter the Debian archive, because packagers
use the tools and fix anything that needs fixing before uploading.

The tool I wrote, piuparts, is similar: it should be used by the
package maintainer before uploading, so that a buggy package never
enters the archive. Piuparts is pretty new, so it's unsurprising
that most people don't use it yet.

    C) Fix bugs that have been reported

Not all of the bugs the automatic tools find are fixed, however. And
there's those 82 thousand other bugs that need fixing as well. While
many of them are wish list bugs, or it is questionable if they are
bugs at all, they do all require some attention.

What can we do about that? Can we get down to no (fixable) bugs in
a stable release? (Fixable bug here means a bug that can be fixed at
all with reasonable effort by the Debian package maintainer. Having
to rewrite the X server doesn't count as reasonable effort. Wish
list bugs should probably be excluded as well.)

Having no bugs is a good state to be in. When a bug is reported,
it is usually easier to fix if there aren't a bunch of other bugs
disturbing the process.

Our Bug Squashing Parties are useful, but they mostly concentrate
on release critical bugs. Other bugs get less attention, but needs
to be fixed too. I'm guilty of ignoring many of the bugs against
my own packages for extended periods of time. People like me are
part of the problem.

Several ideas have been floating around for years on how to improve
this situation, of which I'd like to mention three. While I've here
used the number of bugs as the measure of a package's quality,
the same ideas might help with other aspects, like getting new
upstream versions packaged soon after they're released.

    * Team maintenance. If a package is maintained by a team, 
      there are more people sharing the work. When a team works 
      well, more people look at the package, and finding and 
      fixing problems is more effective. There is less work per 
      person, so things don't lag as much. A well-working team 
      is a good thing.

      As an example, the Debian GNOME team seems to work really 
      well. Transitions to the next upstream version happen 
      quite smoothly these days.

      Mandatory teams for packages seems ridiculous to me. 
      Lots of packages are so small that having to arrange a 
      team for them, even if it is only the effort to set up 
      and subscribe to a team mailing list, is wasteful. Not 
      everyone likes to work in a close team, either, and we 
      shouldn't exclude them.

    * Less strong ownership of packages. The current state in
      Debian is that the package maintainer (or maintainer 
      team) owns the package, and as long as they don't cause 
      a lot of trouble, and don't have release critical bugs, 
      everyone else is invited to keep their hands off.

      If the maintainer, for whatever reason, can't keep the
      quality of the package up, it will have to degrade a lot
      before anyone else dares to touch it. If this 
      Non-Maintainer Upload threshold were lowered, it might 
      be that quality could improve. There would probably 
      be a number of mistakes made, but that also happens 
      when people take on a new package.

      This is not the same as maintenance by a team. An NMU 
      is done by someone interested in the package for 
      whatever reason, but they only do the upload to fix a 
      specific problem or problems, not to join the maintainer 
      team for a long time.

      This idea hasn't been tested. It could be tested if 
      some group of maintainers declared that some or all 
      of their packages were part of the experiment, that 
      anyone could NMU them for any reason whatsoever, as 
      long as they take proper care not to mess the package 

      (I'm willing to participate in such an experiment 
      myself, but I haven't thought out the details yet.)

    * Abolishing package ownership completely. This is a more 
      radical version of the previous one. I'm not going to 
      argue for it until the milder form has been tested first.

The main theme here is the need for speed: bugs need to be closed
faster, and things that help this would be good.

    D) Prevent bugs from entering the archive

In program development, it is usually a good idea to not commit
anything until it passes all automatic tests. Similarly, I propose
that it would be good for Debian to use some of the automatic tools
before a package is accepted into the archive. For example, if
lintian finds the init.d-script-does-not-implement-required-option
error[6], is there any reason to accept the package, since it is
certainly buggy?


There are some practical issues with this, of course. Not all lintian
and linda warnings should prevent accepting a package, because
that might prevent fixing more important problems quickly. That's
fine-tuning, however, the general principle still applies: it's
better to prevent a buggy package from entering than fixing it later.

Some of the automatic checking might be too heavy, or too risky, or
otherwise impractical to run when processing incoming packages. In
these cases, it is better to accept the package into the archive
and then run tests later, and file bugs for any problems found.

Lintian has been run on all packages for many years. The results
are listed on a website[7], but many packages go for months, even
years without fixing the problems. When I started running piuparts
on all packages, I decided to report any problems as bugs, instead
of just publishing log files. This seems to work better: many of
the bugs have been fixed, some of them even in the same day.

    [7] http://lintian.debian.org/

    Automated testing of program functionality

I'm speaking here about whether the programs in the package work,
not whether the packaging itself works. Lintian, linda, and piuparts
already test the packaging fairly well, I think. Also, I'm speaking
about active tests that require running programs in the package;
lintian and linda don't do that, and shouldn't.

In this section especially I'm partly rephrasing what Ian Jackson
and others said in the previous discussion (or what I think they
said), partly adding my own thoughts.

Having a way to automatically test that a package is at least
minimally functional is clearly a good thing. Speaking from the
point of view of someone who occasionally does NMUs to fix release
critical bugs in other people's packages, the easier it is to
check that a package still works after I've mucked about with it,
the easier things will be for everyone involved.

Automatic testing needs to happen in various contexts:

    * When the package is being built. Most of such tests should go
      into an upstream test module. Traditionally, this would
      correspond to "make check".

    * When the package has been built, but before it is uploaded.
      This is similar to testing with lintian, linda, and piuparts.
      The difference from build-time tests is that the tests are
      run when the package is installed onto a system (possibly a
      chroot or a virtual system).

    * Before an uploaded package is accepted into the archive. This
      would prevent buggy packages from entering the archive.

    * On specifically crafted test systems. This would check that
      packages still work even though other packages they depend 
      on have changed.

    * On real systems, to verify that things still work. This would
      potentially be a big help to system administrators.

Some issues:

    * Test data. Some tests are going to require a large amount
      of test data and that is best kept out of the binary package. 
      It is probably best to keep it in the source package only: 
      the test program then needs to install (and maybe partially 
      build) the source package.

    * Test dependencies. Many tests will require using tools
      that neither using nor building the package needs. Thus we 
      probably need "Tests-Depends" (for the source package).

    * Generic tests. Since Debian has so many packages, as much
      as possible should be tested using generic tests that apply 
      to many packages.  For example, checking that an executable 
      can be run at all should be a generic test. Expecting ten 
      thousand source packages to add tests for that is unwarranted 
      optimism. Generic tests should require nothing from the 
      package itself.

    * Specific tests. Obviously it is also necessary for each
      package to be able to provide tests for its particular
      peculiarities, such as instances of old bugs to avoid them
      re-appearing. The interface for this should allow various 
      tools to be used for implementing the tests, so that there 
      is space for evolution of good tools. Compare the situation 
      with a raw debian/rules file and helper packages.

    * Non-burdening of buildds. Especially slow architectures
      might want to skip build-time tests to save time. There 
      should be a way to build the package without running tests, 
      and of course to run the tests only.

My concrete proposals:

    * Let's write a tool that can do at least simple generic tests
      (we'll expand it later).

    * Let's standardize on a way to invoke package specific tests:
      "debian/rules test-build" for build-time tests and
      "debian/rules test-install" for tests of the installed
      package. Neither must require the package to be built
      already. The rules targets can only be assumed to exist if
      debian/control contains a "Tests-Depends".  Whoever calls
      these must take care of installing test-dependencies.

    * Let's modify pbuilder to run test-build tests and (if
      possible) also the generic tool and test-install tests. 
      These belong, I think, better into pbuilder then piuparts, 
      but it might be that piuparts should run them also.

    * Let's also write a tool that a sysadmin (or tester) can use to
      run test-install for particular packages, or all installed

    * Ian's proposed debian/tests/control interface sounds
      nifty. I'm not going to debate the exact details (at least 
      here), they should probably be decided by the implementer 
      ("those who, decide"). It should be implementable as a tool 
      that can be run from "debian/rules test-install", I think. 
      This will allow Debian packagers who like it to use it, but 
      those who prefer something else can use that instead.  Some 
      people might want to use all available tools, even.

    * After all this is done, let's start a campaign where every bug
      fix includes a patch to add a regression test for it.

    Let's take quality assurance seriously

Quality assurance is currently performed by a few people organized
around the debian-qa mailing list, and various other people
(including me). I see the need for a more aggressive, systematic
approach to quality assurance.  This might be implemented by
(re-)forming the debian-qa team with a modified agenda, something
like this:

    The task of the debian-qa team is to proactively find and fix
    (technical) problems in Debian packages, and to temporarily
    maintain orphaned packages.

Some of the things that it might make sense to organize better
include (if these already are organized well, I apologize):

    * Reporting serious problems found by lintian/linda as bugs
      against packages.

    * Reporting problems found by piuparts. I already do this,
      but it would be good to expand it to a couple more 
      architectures (at least), and to have more people process 
      logs of failed tests.

    * Testing that all packages that are of optional or higher
      priority actually can be installed at the same time. (This 
      should be partly doable by analyzing Contents files, if it 
      isn't already.)

    * Testing that all packages can still be re-built even when
      compilers, libraries, or other build dependencies have 

    * Checking that a system with as many packages as possible
      installed can be upgraded from stable via testing to sid.

Then, of course, there is the fixing of bugs, but I've discussed
that above.

PS. Sorry again, my foot note numbering got confused.

On a clear disk, you seek forever.

Reply to: