The sarge release disaster - some thoughts

To: debian-devel@lists.debian.org
Subject: The sarge release disaster - some thoughts
From: Adrian Bunk <bunk@stusta.de>
Date: Tue, 15 Mar 2005 15:32:27 +0100
Message-id: <[🔎] 20050315143227.GJ3189@stusta.de>
Mail-followup-to: Adrian Bunk <bunk@stusta.de>, debian-devel@lists.debian.org
Hi,

I'm a former Debian developer, and this mail contains some subjective 
observations of mine regarding what lessions Debian might learn from 
mistakes during the sarge release cycle.

Contents:
- Introduction
- Have a second plan - Discover problems early and react
- RC bugs - only a metric
- Dump testing?


Introduction
------------

These are just my personal thoughts.
If you think 90% of this email are bullshit, I'm glad to hear that you 
think 10% of this email contain valid points.

I'm talking about issues with the release management.
This isn't meant as a personal offence against former release manager
Anthony Towns and the current release managers Steve Langasek and
Colin Watson and their assistants. Everyone makes mistakes and I might 
have made more mistakes in my life than all these people together.
I do believe that the Debian release managers tried their best.
But things went wrong. It wasn't bad luck - mistakes were made.
And the mistakes should be evaluated to avoid them in the future.

The situation today is that several announced release dates for sarge 
have passed, with the first one being December 1st 2003 [1].

Debian stable is horribly outdated - e.g. it's nowadays non-trivial 
finding new hardware completely supported by Debian 3.0 .

Why don't I send this mail after the release of sarge?
Well, I thought exactly this way several months ago.
But much time passed since then, and the release of sarge is still not 
in the past.


Have a second plan - Discover problems early and react
------------------------------------------------------

Some pretty simple rule might have avoided several delays in the sarge 
release cycle:
If there are risks that might cause great delays, habe a second plan if 
it doesn't work out as planned.


Two examples:


The Debian installer

The timeline for the first officially announced release date was:
- August 19th 2003: announcement 
- October 1st 2003: installer has to be in a state that it only 
                    requires "last minute fixes"
- December 1st 2003: announced release date
- December 1st 2003: announcement that sarge isn't being released

I don't know whom of the people responsible to the installer had 
promised to Anthony that the installer would be ready that fast. Anthony 
said in his announcement that the timeline he set was an "aggressive 
goal". With this in mind, extra care would have been required to have a 
second plan if any part of his release plan would fail.

Not after October 1st 2003 it sould have been clear that the progress 
of the installer wasn't as good as expected. This was 2 months before 
the announced release date.

What would have been a second plan?
Nobody likes boot-floppies.
But considering the choice between releaseing Debian 3.1 with the new 
installer in 2005 or releasing Debian 3.1 with boot-floppies in 2003, it 
might have been possible finding some Debian developers hacking 
boot-floppies to use them for Debian 3.1 .

The new installer would have been ready in time for Debian 3.2 .

Would this have been an ideal solution?
No.
But it's quite possible that it might have worked - and that it might 
have benefitted the users of Debian.


The timeline for another failed release date:
- August 2nd 2004: announcement
- August 8th 2004: "Official security support for sarge begins"
- September 15th 2004: announced release date

The milestone that included the start of the official security support 
for sarge was only 6 days after the announcement, but is was missed by 
more than 6 months.

Whyever it was expected to get testing-security for sarge that quick, it 
should have been obvious 6 days later that it wasn't possible that 
quick.

What would have been a second plan?
Use testing-proposed-updates.

Using testing-proposed-updates for security fixes, users might have 
gotten security updates one or two days after the DSA on some 
architectures.

Would this have been an ideal solution? 
No.
But it would have worked without a great impact on the release date.



RC bugs - only a metric
-----------------------

Nowadays, it seems the main metric to measure the quality of a release 
inside Debian is the RC bug count.

As with any metrics, work on improving the metric might make the metric 
look much better, but doesn't at the same time imply that the overall 
quality improved that much.


An example:

A major problem in Debian are MIA developers.

Consider a MUA maintained by a MIA developer with the following bugs:
- #1 missing build dependency (RC)
- #2 MUA segfaults twice a day (not RC)

Consider the two possible solutions:
1. a NMU fixing #1
2. - ensure that the maintainer is MIA
   - orphan all packages of the MIA maintainer
   - new maintainer adopts MUA
   - new maintainer fixes both bugs

The first solution has a quick positive effect on the "RC bug count" 
metric.
The second solution looks worse in this metric, but it's actually better 
for the users of Debian.


Dump testing?
-------------

It seems noone asks the following question:
Testing - is it worth it?

Several people have stated that with the size of Debian today, it 
wasn't possible to manage a release without testing with a "traditional" 
freeze (unstable will be frozen at a date, announced several months 
before), and that only testing makes releasing possible.

I have yet to see any objective evaluation of this claim.

The number of packages has increased since potato times, but at the same 
time, the number of release managers and assistants has increased from 
one to five. If the number of members has increased that much despite 
testing, perhaps the same number of people might be able to handle a 
traditional freeze? 

I remember that when testing was introduced, it was said that testing 
might always be in a releasable state. History has shown that testing 
was sometimes in a better shape as unstable, but also sometimes in a 
worse shape. Testing has some advantages over unstable (always 
fulfillable dependencies, some kinds of brown paperbag bugs are very 
unlikely), but serious data loss bugs like #220983 are always possible.

testing was expected to make shorter freezes possible.
Neither the woody nor the sarge freeze support this claim.
This might not only be the fault of testing, but the positive effects of 
testing (if any) aren't visible.

Regarding the release process, testing offers some advantages like 
having a relatively low number of RC bugs and packages built on all 
architectures.

But the question is whether the same can't be achieved using a 
traditional freeze of unstable and sorting these things out during the 
first one or two weeks after the freeze.

This might make a freeze a bit longer?
Perhaps.
But consider the disadvantages of testing:
- Testing causes additional work for both the release team and all 
  Debian developers.
  As an example, library transitions are always a pain due to testing.
  And RC bugs already fixed in unstable but not in testing need to be
  tracked.
- There's some collateral damage of removing packages from testing
  that depend on another package that gets removed due to some RC bug
  or that gets removed during some library transition but isn't able
  to come back due to some other reasons that might come from quite
  different causes (e.g. the blocking of a more recent KDE from testing
  makes it impossible for new vim packages to enter testing).
  Yes, there is a logic behind how testing works, but the result is 
  often chaotic and surprisingly.
- Architectures have to be in sync due to testing.
  It should be noted that all problems with architectures not being in 
  sync are only caused by testing.
  An architecture without an autobuilder is dead.
  But if an architecture doesn't has any autobuilder for two weeks
  this wouldn't cause any problems if testing wouldn't exist.
- Bugs fixed long ago in unstable might still be in testing.
  RC bugs might be tracked, but e.g. blocking a more recent KDE from
  testing might prevent any new package of vim that might fix some
  nasty non-RC bugs from entering testing.

Yes, some of these things can be done better with testing (e.g. via 
version tracking in the BTS).

But this still leaves the question whether introducing testing actually 
was an improvement compared to the previous release process or not.



Thanks for reading this email
Adrian

[1] http://lists.debian.org/debian-devel-announce/2003/08/msg00010.html

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed
Reply to:
Follow-Ups:
- Re: The sarge release disaster - some thoughts
  - From: Martin Zobel-Helas <zobel@ftbfs.de>
- Re: The sarge release disaster - some thoughts
  - From: Joey Hess <joeyh@debian.org>
Prev by Date: Re: [RFC] OpenLDAP automatic upgrade
Next by Date: Re: Bits (Nybbles?) from the Vancouver release team meeting
Previous by thread: Scrum Development Process
Next by thread: Re: The sarge release disaster - some thoughts
Index(es):
- Date
- Thread