[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[RFC] Fail builds when regression check shows new errors.



debian-glibc,

I know that many of us have been talking about this issue, and it
surfaced for me last week and bit me softly. 

We currently run the glibc testsuite with 'make -k check', ingoring the
testsuite failures. This isn't the way we should be doing business.
Instead I propose we maintain a list of allowed failures per-arch,
including the make error number during the failure. This list is
compared against the builds 'make -k check' results and if there is a
discrepancy the build is failed.

In my local glibc tree I had some floating point bugs go from Error "1"
to Error "139", meaning that innocious tests are now segfaulting. I
wasn't alerted because I didn't notice the difference.

I've implemented this framework and attached the patch below. The logic is
as follows:

In debian/rules we set TESTING_CHECKING and TESTING_CURRENT. The former
is the location of the file containing the expected failures. While the
latter is the location we put our current results. If logging is
disabled we disable regression testing, since we need the log to do
the post-processing of the 'make -k check' result.

In debian/rules.d/build.mk I add all the logic required to test for
differences in the list of failures. If the file containing the expected
failures is *not* present, a warning is output, but the build continues
normally. If the file is present, and differences are noted, the build
is aborted. This should alert the team to possibly serious regressions
that could have crept into unstable, rendering systems unusable.

I added the directory testsuite-checking for all the extra files,
including a script to convert the output of our normal log-test-*-libc
into an $arch-test-results file.

I've tested the various permutations of extra failures, less failures
than normal, no results file, and NO_LOG enabled.

Under Jeff's recommendation I switched to using 'comm' to tell the
difference between the files. I didn't want to add 'diff' to the
build-depends list.

Please comment on the patches and the general logic of the framework. 
I think it would definately be a good addition to our builds :)

Thanks for listening,
Cheers,
Carlos.

---
 debian/rules                                      |   14 ++++++-
 debian/rules.d/build.mk                           |   19 +++++++++
 debian/testsuite-checking/README                  |   20 ++++++++++
 debian/testsuite-checking/convertlog.sh           |   15 +++++++
 debian/testsuite-checking/hppa-linux-test-results |   17 ++++++++
 5 files changed, 84 insertions(+), 1 deletion(-)
---

2004-02-19  Carlos O'Donell  <carlos@baldric.uwo.ca>

	* debian/rules: Define TESTING_CHECKING and TESTING_CURRENT,
	disable TESTING_CHECKING if NO_LOG is true.
	* debian/rules.d/build.mk: Add TESTING_CHECKING logic after the
	testsuite build step.
	* debian/testsuite-checking: New Directory.
	* debian/testsuite-checking/README: New file.
	* debian/testsuite-checking/convertlog.sh: New file.
	* debian/testsuite-checking/hppa-linux-test-results: New file.

diff -urN debian-orig/rules debian/rules
--- debian-orig/rules	2004-02-19 14:59:01.957929240 -0500
+++ debian/rules	2004-02-19 15:55:02.389066056 -0500
@@ -19,7 +19,7 @@
 #  Create build directory
 #  Configure
 #  Build
-#  Test as desired
+#  Test as desired, including regression comparisons.
 #  Install to package directories
 
 # Run debian magic to build packages.
@@ -70,6 +70,16 @@
 # Default setup
 GLIBC_PASSES ?= libc
 
+# Run testsuite regression testing?
+# First we check to see if we have an expected failure set to
+# compare against.
+# 
+# Notes:
+# - Remember testing is skipped in a cross-compile environment.
+# - Checking the testsuite is skipped if you disable logs.
+TESTSUITE_CHECKING = $(CURDIR)/debian/testsuite-checking/$(call xx,configure_target)-test-results
+TESTSUITE_CURRENT = $(CURDIR)/debian/testsuite-checking/$(call xx,configure_target)-current
+
 prefix=/usr
 bindir=$(prefix)/bin
 datadir=$(prefix)/share
@@ -107,6 +117,8 @@
 else
 log_build   := /dev/tty
 log_test    := /dev/tty
+# Disable the testsuite check without logs...
+TESTSUITE_CHECKING = nocheck
 endif
 
 # Which build pass are we on?
diff -urN debian-orig/rules.d/build.mk debian/rules.d/build.mk
--- debian-orig/rules.d/build.mk	2004-02-19 14:59:04.201588152 -0500
+++ debian/rules.d/build.mk	2004-02-19 15:55:12.313557304 -0500
@@ -70,6 +70,25 @@
 	else \
 	  echo Testing $(curpass); \
 	  $(MAKE) -C $(DEB_BUILDDIR) -j $(NJOBS) -k check 2>&1 | tee -a $(log_test); \
+	  if [ -e "$(TESTSUITE_CHECKING)" ]; then \
+	    echo "Checking testsuite results against $(TESTSUITE_CHECKING)"; \
+	    grep '^make\[3\].*Error' $(log_test) \
+	    | sed 's,^.*/,,g' | sed 's/\]/,/g' >& $(TESTSUITE_CURRENT); \
+	    comm -3 $(TESTSUITE_CHECKING) $(TESTSUITE_CURRENT) 2>&1 | grep -v '#' | tee -a $(log_test); \
+	    differences=`comm -3 $(TESTSUITE_CHECKING) $(TESTSUITE_CURRENT) 2>&1 | grep -v '#' | wc -l`;\
+	    if [ $$differences -ne '0' ]; then \
+	      echo "Encountered regressions that don't match expected failures. Aborting."; \
+	      false; \
+	    else \
+	      echo "Passed regression testing. No new failues, no changed error values."; \
+	    fi \
+	  elif [ -n "$(findstring nocheck,$(TESTSUITE_CHECKING))" ]; then \
+	    echo "TESTSUITE_CHECKING is disabled, will not check testsuite results."; \
+	  else \
+	    echo "***WARNING*** Please generate testsuite results for this arch!"; \
+	    echo "TESTSUITE_CHECKING is enabled, but arch has no testsuite results!"; \
+	    echo "***WARNING***"; \
+	  fi \
 	fi
 	touch $@
 
diff -urN debian-orig/testsuite-checking/README debian/testsuite-checking/README
--- debian-orig/testsuite-checking/README	1969-12-31 19:00:00.000000000 -0500
+++ debian/testsuite-checking/README	2004-02-19 15:00:34.768819840 -0500
@@ -0,0 +1,20 @@
+Testsuite Regression Testing
+============================
+
+This directory contains the allowed failures during a testsuite run for
+a given architecture. The files are listed as "<arch>-test-results". 
+Where "<arch>" is extracted by rules as "$(call xx,configure_target)".
+
+A script, "convertlog.sh", can be used to process the normal log-test-*-libc
+file into a *-test-results file.
+
+Since we cannot run the testsuite without "-k", we run the enitre testsuite
+ignoring errors. Then we compare the error set with the expected errors, 
+differences in signal numbers or errors fails the build. If the architecture
+does not have a *-test-results file a warning is given and no comparison is
+made.
+
+For reference please see the TESTSUITE_CHECKING variable in:
+debian/rules
+debian/rules.d/build.mk
+
diff -urN debian-orig/testsuite-checking/convertlog.sh debian/testsuite-checking/convertlog.sh
--- debian-orig/testsuite-checking/convertlog.sh	1969-12-31 19:00:00.000000000 -0500
+++ debian/testsuite-checking/convertlog.sh	2004-02-19 15:00:34.768819840 -0500
@@ -0,0 +1,15 @@
+#!/bin/bash
+
+if [ $# -ne '1' ]; then
+  echo -e "\nUsage: Converts a log-*-libc file into a *-test-results file."
+  echo -e "$0 : < Input testsuite log file >\n";
+  exit 1
+fi;
+
+echo -e '#\n# Allowed architecture testsuite failures.'
+echo '# Someone should be working towards fixing these!'
+echo '# Format: <Failed test>, Error <Make error code> [(ignored)]'
+echo '# They are listed here for the purpose of regression'
+echo -e '# testing during builds\n#'
+grep '^make\[3\].*Error' $1 | sed 's,^.*/,,g' | sed 's/\]/,/g'
+
diff -urN debian-orig/testsuite-checking/hppa-linux-test-results debian/testsuite-checking/hppa-linux-test-results
--- debian-orig/testsuite-checking/hppa-linux-test-results	1969-12-31 19:00:00.000000000 -0500
+++ debian/testsuite-checking/hppa-linux-test-results	2004-02-19 13:23:32.531934384 -0500
@@ -0,0 +1,17 @@
+#
+# Allowed failures for HPPA and the returned make failure
+# number, indicating the signal the process died with.
+#
+test-float.out, Error 1
+test-double.out, Error 1
+test-ifloat.out, Error 1
+tst-strtod.out, Error 1
+bug-strtod.out, Error 1
+tststatic.out, Error 139
+annexc.out, Error 1 (ignored)
+neededtest.out, Error 14
+neededtest2.out, Error 14
+neededtest3.out, Error 17
+neededtest4.out, Error 2
+circleload1.out, Error 9
+tst-tls13.out, Error 1



Reply to: