[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Debian development and release: always releasable (essay)

This is from Russ Allbery and myself.  See http://wiki.debian.org/Debate
for context, and http://wiki.debian.org/AlwaysReleasableTesting for
the canonical version of this essay. We hope that the readers will take
their time to read this, reflect on it, and then maybe write their own
essay and add it to http://wiki.debian.org/JessieReleaseProcess .
Comments on the wiki or by e-mail are, of course, always welcome.

 - - -

The wheezy freeze has been much too long. At ten months, it's four
months longer than what we've gotten used to in several previous
releases. Had we managed to keep the freeze at six months, it would
still have been too long. I believe there is something wrong in how
we develop Debian, and how we do releases, and that by fixing them,
we can have much shorter releases, with an increase in their quality.

Freezes are long in part because we need to do so much work during
them. Most importantly, we need to fix so many release critical bugs
(RC bugs), that a short freeze is not possible, without drastically
lowering the quality of Debian.

A long freeze is highly frustrating to everyone. It's a very stressful
period for the release team, obviously, but since the freeze affects
all development, even those of our developers who do not care about
the release feel its effects in their development. Our users would
like fresh upstream versions, but that rarely happens in unstable,
and because the freeze is so long, when the release actually happens,
much software seems a bit stale. Upstreams, who would like to get
their software into the hands of users as soon as possible, including
via Debian, are also frustrated.

We should aim for a short freeze, perhaps as short as two weeks,
and certainly not longer than two months. This would remove the
frustration, and fix the other problems related to a long freeze.
However, to achieve a short freeze, we need to change how develop

The fundamental change is to start keeping our "testing" branch
as close to releasable as possible, at all times. For individual
projects, this corresponds to keeping the master or trunk branch
in version control ready to be released. Practitioners of agile
development models, for example, do this quite successfully, by
applying continuous integration, automatic testing, and by having
a development culture that if there's a severe bug in master,
fixing that gets highest priority.

We can do similar things in Debian, and if we do, I believe that we
can keep testing in a releaseable state almost all of the development
cycle between two releases. The minimum necessary changes to achieve
this, in my opinion, are:

* An attitude change: we decide that releases are important, and that
  they're the job of the entire project, not just the release team.
* Keep testing free of RC bugs.
* We should use automatic testing much more extensively, to find
  problems as early as possible.
* We should limit the number of packages we strongly care about for
  a release.

Releases are important

Releases are important to many, perhaps most, of our users. Hackers
and hardcore powerusers don't necessarily care about them, of course,
but most others do. A released version of Debian implies that the
operating system works: there's a working installer, for example.
It also implies that all the packages are expected to work together:
there's no transitions going on, for example, that might break
dependencies or reverse dependencies.

A release is important to many users because it means that if they
have to re-install, they will get back the same system they used to
have. Or they can install another computer that will behave the same
way as the first one. This reproducibility is also why enterprises
like them: they can confidently assume that if they install fifty
thousand machines, they'll all be the same. Without this kind of
uniformity, system administration costs, and end-user support costs,
become unmanageable.

But releases are also important for us, as a project. They're an
excellent point to stop and say, "we have achieved this, and it is
good". It's a reason for others to have a look at Debian and see that
it is good. This generates a good feeling, which gives us more
motivation to work on Debian.

It's true that we can't expect every Debian developer to care about
making a release. That's OK. We just need the minority who don't care
to not get in the way of the release.

Keep testing free of RC bugs

The RC bug count for the testing branch should be kept low all the
time. Right after a release, by definition, testing is free of RC
bugs. With the current development model, right after the release we
open the floodgates, and large number of new packages and versions
enter testing. The bug count sky-rockets, but we don't care a lot
about that until the next freeze gets closer.  This means testing
is not anywhere near in a releasable condition during most of the
development cycle.

We should, instead, make sure testing is kept free of RC bugs as much
as possible. There are a variety of things we can do about it:

* Remove RC buggy packages sooner rather than later. An RC buggy
  package should be removed at soon as possible: when the bug
  is identified, allow a bit of time for the bug to be verified
  (was it actually an RC bug?), but after that, remove the package
  from testing, preferably automatically.  If the package has
  reverse dependencies, remove those as well. This keeps testing
  releasable. The removed package can and will re-enter testing once
  it gets fixed.

  To reduce the sting of optional packages missing the release, we
  should consider whether we're willing to introduce new packages
  in stable point releases.  Obviously, only packages that have
  no new dependencies could be introduced that way, so things that
  require newer versions of the packages already in stable would not
  be eligible.  But it means that if a package was in the previous
  stable but missed the current stable due to unresolved issues at
  the time of the releease, we could still get it back in and it
  wouldn't have to wait another year or two.

  We would need some staging area to ensure that the stable build
  of the package was actually tested.  Backports could be used for
  that purpose.

* When a package is too important to be removed from testing (e.g., gcc
  or bash), if it gets an RC bug, all developers should be encouraged
  to help fix it. This can be done in various ways, from the fun (a
  BSP aimed at that one bug only, perhaps) to the dictatorial (prevent
  all uploads to unstable unless they fix an RC bug in testing).

* When the RC bug count in testing grows above a particular threshold,
  have a bug-fix-only mini-freeze: stop the migration of packages to
  testing, except for packages that fix RC bugs. Ideally, we would
  automate as much of this as possible rather than making the release
  team do it manually. When the RC bug count drops back below the
  threshold, we re-open testing. This provides a constant feedback
  cycle where, if we're not managing RC bugs properly, testing stays
  frozen more and more and provides more pressure to manage RC bugs

Not having RC bugs in testing is a necessary (though not sufficient)
condition for releasing. We have to keep the count as close to zero
all the time in order to keep the freeze short.

Reference installations

Debian is now much too big to give the same importance for every
package, as far as the release is concerned. In reality, we don't:
the release team has a much lower threshold for removing nethack than
it has for bash. We can release without nethack, but not without the
default shell.

We should codify this, and make it  what counts as necessary package
to be included in the release, and what does not. I propose a set of
"reference installations" of Debian, for various purposes.  We have
the related concept of "task" already, in the installer:

* ssh server
* mail server
* LAMP server
* desktop system
* print server
* etc

We should have an explicit list of such reference installations
and declare them as crucial for the release: if they work, we can
release, and if they don't, we can't. Each reference installation
should have a clearly defined purpose, and therefore a clearly
defined list of packages that must be included.

A package that is not included in one or more of the reference
installations is a package we want to include in the release, but we
will not delay the release for its sake. We should have a low threshold
for removing such a package from testing: it could perhaps even be
removed automatically one week after an RC bug is filed against it
(assuming the bug affects the version in testing).

This creates two classes of citizenship for packages. This is
unavoidable, and is actually already the case.  It is not a criticism
of the packages, or their maintainers, if they're not included in a
reference installation. Nethack just isn't as important at bash.

The only difference between packages included in reference
installations and those not included is that packages in reference
installations have a higher threshold to be removed from testing.
(If a reference installation does not meet quality criteria,
the release team has the option of dropping it.)

The set of reference installations requires careful thought and
broad consensus. They are the packages we, as a project, especially
wish to support. Each reference installation should also be
possible to verified for quality: there should be an automatic
test suite of sufficient coverage and quality that it makes sense
to let it be crucial for the release.

Use automatic testing extensively

We have some automatic testing tools specifically for Debian: lintian,
piuparts, adequate, autopkgtest, and probably more. We should use
these much more extensively, and let them guide the migration of
packages into testing.

Automatic testing will catch some classes of bugs much faster, and
perhaps more reliably, than relying on bug reports. We need both.
The job of automatic testing is not to prove the absence of bugs,
but to establish a trusted lower limit for quality: it shows us that
certain things work and will notify us if we ever break them. This
gives us, the developers, more confidence that the changes we make are
not too destructive, and notifies us if they are. Most importantly,
automatic testing will find bugs faster, which then makes it easier
to fix them, and reduces their impact.

Imagine a continuous integration system for Debian: for every new
package upload to unstable, it builds and tests all the reference
installations.  If all builds succeed, and all tests pass, the package
can be moved into testing at once. When you, a developer, upload a
new package, you get notified about test results, and testing migration,
within minutes.

The number of packages in Debian, and the amount of churn in unstable,
makes this not quite possible to achieve without massive amounts of
hardware. However, we can get close: instead of testing each package
separately, we can test together all the packages that have been
uploaded to unstable since the previous run, and mostly this will be
a fairly small number of packages.

Ideally we will run the tests for each release architecture, but it
may be enough to run them on amd64 only. We'll need to experiment with

Automatic tests do not need to have very much coverage in order to
be quite useful. Even very simplistic tests, like the ones piuparts
does, find quite a lot of problems. If we create a framework to run
the tests which makes it easy to add more tests, we will in time
accumulate a large test battery. Look at lintian: it has a staggering
number of tests now, but they've been written over a period of more
than a decade. Ideally, we can benefit from such tests that have
already been written for other distributions, and share ours with

Tests for running reference installation might include the following:

* Basic networking setup works: System responds to ping from the outside.
* Mail server responds appropriate on the SMTP, submission, IMAPS, and POPS
* LAMP server responds on the HTTP and HTTPS ports.
* A desktop system that automatically logs in a test user has the right
  processes running, and can start some common applications.
* In each case, it's possible to log in remotely with ssh and run
  "sudo apt-get install hello".

These are trivial, even simplistic tests. However, if they pass, we know
that at least the basic, fundamental things in the system are not horribly
broken: networking, system administration, and the software that is meant
to start in that reference installation. Furthermore, we know that the
debian-installer works. That's a good foundation for further hacking.

Holger Levsen is already doing at least some of this on
<http://jenkins.debian.net/>, and he's happy to get help to improve
that service further.


We believe, based on our experience as software developers, that
adopting these suggestions will make the jessie release cycle and
release process smoother, and increase the quality of the end result.

Lars Wirzenius
Russ Allbery

http://www.cafepress.com/trunktees -- geeky funny T-shirts
http://gtdfh.branchable.com/ -- GTD for hackers

Reply to: