[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Implementing "testing" (was: Re: Potato now stable)

Hello world,

So, on -devel-announce, I mentioned:

> 	* New "testing" distribution
> 		This is a (mostly finished) project that will allow us
> 		to test out distribution by making it "sludgey" rather
> 		than frozen: that is, a new distribution is added between
> 		stable and unstable, that is regularly and automatically
> 		updated with new packages from unstable when they've
> 		had a little testing and now new RC bugs.
> 		(Anthony Towns; debian-devel)

It's basically ready to be stuck in the archive now, as far as I can
tell, but since it's not exactly a trivial change, it's probably time
to discuss it a bit more.

The basic idea, simplified immensely, is to address this problem:

> 	* Testing updates to frozen is suboptimal: updates go into
> 	  incoming, wait there for a while, get added to frozen,
> 	  we discover they introduce as many release critical bugs
> 	  as they solve, rinse, repeat. The "wait for a while" part
> 	  is particularly suboptimal, but without it, it's not really
> 	  a freeze.

The current way we do things is basically to build a new package, hope it
works as advertised, and let people test it. If it doesn't work, we repeat
as many times as necessary, or eventually just throw the package out.

A better way to handle this, which I suspect everyone's just spontaneoulsy
reinvented as the read the above, is to try to keep around a previous
version of the package that was usable. That way if the new packages don't
work, we can just keep the old one rather than having to throw it out

That, essentially, is the point of the "testing" distribution: to contain
a consistent set of the most recent "believed-to-be-reliable" packages.

Some subheadings follow.

Why call it testing?
One thing that the freeze is really bad at is fixing "normal" bugs. The
point of packages in testing is not that they should be perfect or
bug-free, just that they should be usable. There's a lot of difference
between what we'd like to release (0 bugs, many many features) and what
we'll accept for release (~0.005 RC bugs :), and this is really where
beta testing should fit in.

It also sorts nicely compared to "stable" and "unstable" :)

What does "acceptable for release" mean?
For one thing, it means the packages are all consistent: if libgtk1.2.7
is in the distribution, none of the packages should be depending on
libgtk1.2.8. For another, it means packages shouldn't have any release
critical bugs. It also means a package should be at the same version
across all architectures it's present in [0]. It also means the maintainer
of the package should be relatively happy with it.

It means the package shouldn't have any release critical bugs: that is,
no security holes [1] (critical or grave), the package shouldn't crash
your system (critical), it should be usable for someone on the planet at
least (grave), and it shouldn't violate policy too severely, by having
incorrect dependencies, or no copyright, eg [2] (important).

Note that what I'm writing here is what I think's best, and what's
implemented. If there's an objectively better way of doing things, well,
that's why I'm posting. [3]

Okay. So the next question you're probably asking yourselves is "how does
it work". Well, you don't have to ask yourself, you can ask me. Here's a

Archive Layout
As package pools aren't close to being rolled out, I'm opting for as
minor a change as possible (which isn't really very minor). So instead
of two distributions, stable and unstable, we have three distributions,
stable, testing and unstable. As usual packages get uploaded via dinstall
to unstable, broken and buggy however they might be. Eventually, by some
automated process yet to be described, they eventually get added to the
testing distribution. After some amount of time testing gets frozen,
fixed, and released (the theory being that this will be easier than
freezing unstable, fixing it, and releasing).

So basically we'd have:

	unstable	-- bleeding edge, broken, etc
	testing		-- leading edge, maybe buggy, but working
	stable		-- static, usable, going out of date

Automated Process?
So pretty much all the policy is encoded in some "automated process"
which updates testing. It works at the moment, basically as follows:

	1. First, it loads up all the Sources and Packages files in
	   testing and unstable.
	2. It compares and contrasts them, working out what source
	   packages are new in unstable.
	3. For each of these new source packages it checks:
		a. That the package has had two weeks of testing,
		   or it's a medium or high urgency package (and has
		   had either one week, or three days of testing).
		b. That each binary has been recompiled for each arch
		   it's on.
		c. That each binary has 0 RC bugs, or fewer than the
		   testing version does [4].
	4. It then collects the source packages that pass 3, and
	   tries installing them in various combinations to see if the
	   number of uninstallable packages in "testing" either drops
	   or remains the same. If so, they're in. If not, they're not.

There are a bunch of helper scripts that ensure that dists/testing
is fully populated either by symlinks to unstable, or by the files
themselves, and that ensure that if the file in unstable is deleted
by dinstall, that the symlink is changed to a hardlink to the old file
rather than being left dangling.

This has been being prototyped on auric, so you can see some stuff
about it at http://auric.debian.org/~ajt/, and you can point apt at it
too. Pointing apt at it probably isn't really too clever: it doesn't
really have the bandwidth for users doing upgrades, or random people
doing mirrors, and I keep changing things around fairly frequently to
see how the scripts hold up. But you can do it.

The actual scripts to do this are all in my home directory on auric,
and so are probably only accessible to developers. auric:~ajt/doit.sh
is the place to start if you want to have a look.

Okay, so what next?

Effects on the Release Cycle
So the main point of this is to create a distribution that, essentially,
doesn't have any release critical bugs [5] and can be kept that way
with much less effort on the part of the release manager. That should
have a pretty profound effect with regard to speeding up the freeze,
since it removes one of the two main bottlenecks [6].

So, here's a rough guide as to how releases might work with a testing
distribution with a focus on minimising time in the freeze:

	* Development time: packages are worked on, new upstream versions
	  are installed. testing is kept fairly bug-free. Users can point
	  apt at testing, and give feedback to the developers before the
	  freeze, without having to worry about bash not working.

	* Freeze preparation: boot-floppies, CD scripts, release notes are
	  updated to work with the new and updated packages.

	* Freeze: any remaining problems in testing are dealt with, either
	  by adding them to the release errata, downgrading them, fixing
	  them, or removing the package entirely.

Since the remaining problems should be small, the freeze should be able to
be kept very short.

In addition, development and freeze preparation are entirely
parallelizable.  It's plausible and even desirable to simply continue
to maintain boot-floppies, CD scripts, and release notes throughout the
development phase. In an ideal world, testers should be able to obtain
bootable CDs for testing as well as stable, in general.

Even if the latter doesn't happen, though, eliminating just a few bugs
remaining in testing, and any new bugs uncovered by boot-floppies or CD
generation should be a lot easier than fixing all the bugs in unstable,
as well as what new bugs are uncovered by boot-floppies, CD generation.

There's a bit more to it than that actually, but this mail's probably
already getting long enough,

So that leaves...

So, here's how I see us ending up when we've *finished* the transition:

	potato/		woody/		sid/
	stable -> potato
	testing -> woody
	unstable -> sid
That is, all the unreleased architectures, and all the new and broken or
untested packages are in sid; potato's still stable; and the packages in
woody are getting less and less buggy.

To effect this, we would:

	* desymlink potato/binary-powerpc and potato/binary-arm (which point
	  to sid presently)
	* remove sid/binary-powerpc and sid/binary-arm
	* create symlink trees in sid for each of the released architectures
	  pointing at woody.
	* remove the symlinks for unreleased architectures from woody
	* point unstable at sid, and update dinstall so uploads to unstable
	  go to sid
	* point the testing scripts at woody

The testing scripts need to cope with a few things here:

	* Some .deb's in sid will symlink to woody. The .deb's in woody
	  shouldn't be deleted while they're needed by sid.

	* When a .deb in woody is updated, the .deb will already be in sid
	  (and will have been for two weeks). As such, there should
	  simply be a symlink from woody to the actual .deb in sid to
	  conserve mirror space.

	* When a .deb in sid is updated, there may be a symlink from woody
	  to it. This symlink needs to be replaced with a copy of the real
	  .deb since there's nothing to link to anymore.

They do cope with this at the moment, and they're being prototyped on auric
in /org/scratch/ajt/froody (like woody, but a little weird :).

The way they cope with this is to keep a separate copy of the testing
tree in /org/scratch/ajt/hidden, which rather than having any symlinks
is all hard links to the actual .debs. When any of the .debs is removed,
the hard links still remain, and can be copied (well, hard linked again
actually) into the visible tree.

So there you have it.

It's coded. It works. It serves a useful purpose. I think we should
use it.


[0] As opposed to a package being present in all architectures. That is,
    I think it's only appropriate to consider "foo doesn't build on the
    bar architecture" a release critical bug if it's already been built
    there before. And I also think it's appropriate for that bug to be
    downgradable if foo is simply removed from binary-bar.

[1] That compromise root, or user's data. What about denial-of-service
    bugs?  They don't actually fit into the existing severity levels,
    that I can see.

[2] An explicit enumeration of what "too severely" means should appear in
    the next policy update, hopefully.

[3] Here's hoping it won't degenerate like the "Intent to split" mail did.

[4] The number of RC bugs against the testing version is assumed to be the
    number of RC bugs against the package when that version was the latest
    in unstable. If it's wrong, it's probably an underestimate so requiring
    fewer RC bugs in the new package isn't likely to introduce too many new

[5] What release critical bugs will it have? Obviously, it'll still have
    any security problems that get discovered, but presumably they'll be
    fixed within a day or two. There'll be bugs that have existed for a
    long time, but that no one's noticed until recently: things like the
    strange copyright and Depends: of dvidvi.

    One source of significant numbers of RC bugs in testing might be
    -policy changes: requiring Build-Depends:, or moving /usr/doc to
    /usr/share/doc, or requiring packages to be built with libc6 can 
    declare huge numbers of packages buggy, and take a while to fix.

    The other source of bugs that could be problematic are problems like
    the bugs against net-tools and nscd: ones that are obviously critical,
    but aren't reproducable or diagnosed well enough to be fixed.

[6] The other being getting boot-floppies to a point where they can be

Anthony Towns <aj@humbug.org.au> <http://azure.humbug.org.au/~aj/>
I don't speak for anyone save myself. GPG signed mail preferred.

  ``We reject: kings, presidents, and voting.
                 We believe in: rough consensus and working code.''
                                      -- Dave Clark

Attachment: pgp8qzcb2pOhX.pgp
Description: PGP signature

Reply to: