Re: Overall bitrot, package reviews and fast(er) unmaintained package removals

To: debian-devel@lists.debian.org
Subject: Re: Overall bitrot, package reviews and fast(er) unmaintained package removals
From: Neil Williams <codehelp@debian.org>
Date: Wed, 6 Apr 2016 17:16:18 +0100
Message-id: <[🔎] 20160406171618.354bbb86@sylvester.codehelp>
In-reply-to: <[🔎] ne39tk$p5v$1@ger.gmane.org>
References: <[🔎] 1459894690.523675.570085321.670D5B53@webmail.messagingengine.com> <[🔎] ne39tk$p5v$1@ger.gmane.org>

On Wed, 6 Apr 2016 15:27:48 +0000 (UTC)
Felipe Sateler <fsateler@debian.org> wrote:

> On Wed, 06 Apr 2016 00:18:10 +0200, Ondřej Surý wrote:
> 
> > Hey,
> > 
> > while doing some work on PHP transitions, saving courier-imap,
> > finally packaging seafile since they finally stopped violating GPL,
> > I found a quite a lot of bitrot in some (mostly leaf) packages.
> > Packages untouched for years after initial upload, packages with
> > unreachable maintainers, etc[1].

'Unreachable maintainer' is not the same as 'invalid maintainer email
address' (as that is RC). It is much harder to identify "unreachable",
it's why we have an MIA process after all. So I think that needs to be
dropped from this metric.

> > I totally understand that our QA team can't solve all of this, but I
> > have a couple of automated ideas that might help:  
> 
> This is something we really need to start thinking about. Tasks that
> involve more than a few packages usually require a large number of
> NMUs, which is quite sad.

Removal from testing would be the way to work down the dependency
chain, so the metrics should be based on the version of the package in
testing. Yet there would also need to be a way of stopping the package
immediately migrating back into testing as none of the current
migration excuses would apply.

If the idea of marking packages are "not for testing" gets implemented,
then all packages which have been removed from testing, which still
exist in unstable and which cannot migrate back into testing but which
are not marked as "not for testing" could then be removed from the
archive (unstable) after a specified period of time. That may actually
be sufficient - it starts with leaf packages, then next time around,
there's a whole new bunch of leaf packages that were formerly used by
the first bunch to be removed.

(Before you know it, Debian fits on a single CD again - all of Debian.)
;-)

> > * Some automated check that would mark the package as outdated.
> > Outdated packages won't make it into stable and would be removed
> > from unstable. Some indicators that package might be outdated:
> >  - big difference (in time, in version numbers?) between upstream
> >  version and Debian version

Only matters if someone cares enough to file a "please upgrade" bug.
Upstream could change the versioning scheme and completely throw the
metric. e.g. 0.1 uploaded, 2016.4 in upstream.

> >  - no upload in a long time  
> 
> s/upload/maintainer upload/

One key part of the metric would be >2 NMUs without maintainer upload.

No maintainer upload alone is insufficient - uploading every package
once a year "just because" does not help anyone. It's another reason
why simply having an outdated Standards-Version is also insufficient.

> >  - some really outdated standards version
> >  - some really outdated dh compat level
> >  - using outdated packaging tools (and please don't go into the 1.0
> > vs 3.0 fight again here :-)
> >  - something with being a leaf library and not used by anybody else
> > for a long time (combine that with popcon, f.e.?)

auto-removal from testing covers that issue - work down the dependency
chain. Don't rely on popcon, it is indicative only, it cannot be used
in any automated metric. Reverse dependencies are what matter here for
determining "usage", more accurately read as "necessary".

> >  - other indicators  
> 
> - Is maintained by the QA group (for longer than X time?)
> - Is orphaned (for longer than X time?)
> - Is RFA (for longer than X time? Or maybe it should auto-move to
>   orphaned)
> 
> Essentially, if nobody steps up to maintain the packages, then they 
> should go.
> 
> - Maintainer does not respond to bug reports in a timely (eg, 1.5
> months calculated per package).
> 
> I think that maintainer responsiveness should be the key metric, not
> up- to-dateness (ie, the maintainer may be holding back for good
> reasons, but those reasons should be explained).

That could lead to a lot of ping messages in bug reports which might
not be that useful. It could also lead to maintainers closing bugs
which may have previously been left open as wontfix or wishlist. The
severity of the bug may need to be considered.

How do we assess responsiveness on those packages which have 0 bugs?

This does need to be about the package quality, not the maintainer. If
there is a stack of bugs with no response, it is very different to a
package with a couple of wishlist issues. So more than just
responsiveness, it needs to take account of the number and severity of
the bugs to which there has not been a response. There may also need to
be some protection from the implications of severity-ping-pong.
Overall, I think this is an unreliable metric and should not be used.

> This should also help detecting teams that have effectively become
> empty.

That is not the same as low quality packages.

Packages with NMUs not resolved by the maintainer is a much better
metric. The bugs are closed, so responsiveness would not be counted,
but the package is still low quality.

> > 
> > * Package marked as "outdated" would:
> >  a) not be able to enter "stable"
> >  b) not be able to enter "testing"
> >  c) would be removed from "unstable"  
> 
> Adding to the testing autoremoval queue would be a great start.

That also ensures that dependencies are considered.

The full list of identified packages will need some form of marker
because then tracker could indicate this in the same way as it does for
"your package depends on a package which needs a new maintainer" for
orphaned packages. (Maybe the first step for this process *is* to
forcibly orphan the package?)

The individual metrics need to be aggregated to a score but fine
tuning that score algorithm is more work than most people want to do on
packages which are already uninteresting.

What has happened in the past is that a BSP close to a release has had
a reason to look at a particular set of packages and removed the whole
lot in one operation. It's a scatter-gun approach but getting agreement
on the algorithm could take forever.

There needs to be something which makes these uninteresting packages
relevant to something important - beyond them simply being low quality.

-- 

Neil Williams
=============
http://www.linux.codehelp.co.uk/

Attachment: pgpxk_TPcXP3J.pgp
Description: OpenPGP digital signature

Reply to:

Follow-Ups:
- Re: Overall bitrot, package reviews and fast(er) unmaintained package removals
  - From: Felipe Sateler <fsateler@debian.org>

References:
- Overall bitrot, package reviews and fast(er) unmaintained package removals
  - From: Ondřej Surý <ondrej@sury.org>
- Re: Overall bitrot, package reviews and fast(er) unmaintained package removals
  - From: Felipe Sateler <fsateler@debian.org>

Prev by Date: Re: Overall bitrot, package reviews and fast(er) unmaintained package removals
Next by Date: Bug#820217: ITP: mercurial-extension-utils -- This module contains group of reusable functions for writing Mercurial extensions.
Previous by thread: Re: Overall bitrot, package reviews and fast(er) unmaintained package removals
Next by thread: Re: Overall bitrot, package reviews and fast(er) unmaintained package removals
Index(es):
- Date
- Thread