Re: gcc-3.4 emits large amounts of test failures

To: debian-release@lists.debian.org
Subject: Re: gcc-3.4 emits large amounts of test failures
From: Matthew Palmer <mpalmer@debian.org>
Date: Tue, 4 Jan 2005 10:05:34 +1100
Message-id: <[🔎] 20050103230534.GA25567@hezmatt.org>
In-reply-to: <[🔎] 20050103221257.GA2115@nevyn.them.org>
References: <[🔎] 40676846@web.de> <[🔎] 20050103103047.GR14592@hezmatt.org> <[🔎] 87d5wmy5a8.fsf@elisha.gloaming.local> <[🔎] 20050103214745.GB24104@hezmatt.org> <[🔎] 20050103221257.GA2115@nevyn.them.org>

On Mon, Jan 03, 2005 at 05:12:57PM -0500, Daniel Jacobowitz wrote:
> On Tue, Jan 04, 2005 at 08:47:45AM +1100, Matthew Palmer wrote:
> > On Mon, Jan 03, 2005 at 02:07:59PM +0000, James Troup wrote:
> > > Matthew Palmer <mpalmer@debian.org> writes:
> > > >> would pretty much ensure that the package never, ever builds. And
> > > >
> > > > Well, if it's always broken, we don't really want it, do we?
> > > 
> > > If 'failing tests == broken' then we wouldn't have a working compiler
> > > for any architecture and/or for any release.  I think there's a small
> > > flaw in your logic.
> > 
> > So what are the tests useful for, then?  They're obviously useless as a
> > gauge of quality, because failing tests apparently don't indicate a flaw in
> > the software.
> 
> A little common sense, please?  The test results have to be interpreted
> by a human being.  There are about twenty thousand tests and most
> architectures fail maybe a few dozen.

Common sense would suggest that tests that have to be analysed by a human
being after every test run aren't particularly useful.

* The person needs to be sufficiently clued to work out which tests are
actually important and which ones are fluff.  I couldn't make that call, if
buildd admins aren't compiler experts I wouldn't expect them to be able to
make that call, and I wouldn't expect them to need to.

* If your usually clued test-analyst is having a bad day, they might dismiss
a critical failed test as being harmless, resulting in problems passing
through QA that *were* picked up by the test suite but were ignored.

* Are all of the possible exceptions to tests passing well documented?  If
not, then if your clued test-analyst takes a bus between the eyes, you're
down to guessing at which test failures are real.

* If your test failures are documented, then you can turn that list into a
machine-readable list of tests to not run on particular architectures.  This
way, if new tests fail, they stand out like dogs' bollocks, and your
programmers will know that something's gone wrong.  It may eventually come
to pass that the problem is put on the "ignore" list, but until then it's a
big red flag which is waving in the air and saying "there's a problem here,
look at me".  If you've got one real red flag in a field of ignorable ones,
it gets real hard to pick out of the crowd.

* Tests are supposed to be a "here be monsters" type of thing.  If your
programmers get into the habit of ignoring test failures because "they're
harmless", they *will* ignore real problems as well.

- Matt

Attachment: signature.asc
Description: Digital signature

Reply to:

Follow-Ups:
- Re: gcc-3.4 emits large amounts of test failures
  - From: Thiemo Seufer <ica2_ts@csv.ica.uni-stuttgart.de>

References:
- Re: gcc-3.4 emits large amounts of test failures
  - From: "Falk Hueffner" <falk@debian.org>
- Re: gcc-3.4 emits large amounts of test failures
  - From: Matthew Palmer <mpalmer@debian.org>
- Re: gcc-3.4 emits large amounts of test failures
  - From: James Troup <james@nocrew.org>
- Re: gcc-3.4 emits large amounts of test failures
  - From: Matthew Palmer <mpalmer@debian.org>
- Re: gcc-3.4 emits large amounts of test failures
  - From: Daniel Jacobowitz <dan@debian.org>

Prev by Date: Re: quik into testing
Next by Date: Re: gcc-3.4 emits large amounts of test failures
Previous by thread: Re: gcc-3.4 emits large amounts of test failures
Next by thread: Re: gcc-3.4 emits large amounts of test failures
Index(es):
- Date
- Thread