[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Random corruption in atlas computations



Greetings!  The atlas testers fail at random intervals on m68k,
preventing the package build to complete.  I'm suspecting some cache
or register flushing issue.  When I try to stop in gdb at the point
the corrupted value is supposedly written, the problem never appears.

Here are the symptoms:

=============================================================================
gdb xduumtst

(gdb) r
Starting program: /home/camm/atlas3-3.6.0/bin/Linux_base_shared/xduumtst -n 10
NREPS  ORD   UPLO      N    lda          TIME        MFLOPS         RESID
=====  ===  =====  =====  =====  ============  ============  ============

A = 
4.825387  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.203878  4.099503  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.327901  -0.083287  5.514335  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
0.220534  -0.464147  0.251006  3.758231  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.459016  -0.180799  0.470268  0.419886  3.756300  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.183674  -0.443240  -0.111670  -0.006660  -0.058468  4.091890  0.000000  0.000000  0.000000  0.000000  
-0.407398  0.444840  -0.260024  0.490898  0.323276  -0.325004  5.083739  0.000000  0.000000  0.000000  
0.125145  -0.094088  0.439279  -0.195726  -0.408332  0.484317  0.097912  3.649379  0.000000  0.000000  
0.497551  0.354104  0.063572  -0.122261  0.270623  -0.048456  0.059375  0.163214  5.326067  0.000000  
0.294738  0.206778  0.300316  -0.133567  0.238167  -0.396687  0.425517  -0.499697  -0.333246  5.303936  

Ag = 
4.825387  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.203878  4.099503  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.327901  -0.083287  5.514335  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
0.220534  -0.464147  0.251006  3.758231  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.459016  -0.180799  0.470268  0.419886  3.756300  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.183674  -0.443240  -0.111670  -0.006660  -0.058468  4.091890  0.000000  0.000000  0.000000  0.000000  
-0.407398  0.444840  -0.260024  0.490898  0.323276  -0.325004  5.083739  0.000000  0.000000  0.000000  
0.125145  -0.094088  0.439279  -0.195726  -0.408332  0.484317  0.097912  3.649379  0.000000  0.000000  
0.497551  0.354104  0.063572  -0.122261  0.270623  -0.048456  0.059375  0.163214  5.326067  0.000000  
0.294738  0.206778  0.300316  -0.133567  0.238167  -0.396687  0.425517  -0.499697  -0.333246  5.303936  

A = 
24.242575  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.702320  17.632324  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-1.667100  -0.683692  31.059330  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
0.312622  -1.651459  0.880033  14.612729  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-1.691422  -0.325920  1.598296  1.751327  14.514414  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.699587  -2.103017  -0.281893  -0.222681  -0.649662  17.243461  0.000000  0.000000  0.000000  0.000000  
-1.903895  2.361249  -1.147319  2.412340  1.720881  -1.776488  26.038579  0.000000  0.000000  0.000000  
0.390628  -0.388894  1.463406  -0.667490  -1.565002  1.957770  0.154378  13.594301  0.000000  0.000000  
2.551772  1.817071  0.238511  -0.606659  1.361987  -0.125884  0.174435  1.035809  28.478046  0.000000  
1.563271  1.096735  1.592858  -0.708432  1.263223  -2.104005  2.256917  -2.650360  -1.767517  28.131740  

Ag = 
24.242575  0.112335  -1.702597  0.312622  -1.691422  -0.699587  -1.903895  0.390628  2.551772  1.563271  
-0.702320  17.632324  0.030065  -1.651459  -0.325920  -2.103017  2.361249  -0.388894  1.817071  1.096735  
-1.667100  -0.683692  31.059330  0.880033  1.598296  -0.281893  -1.147319  1.463406  0.238511  1.592858  
0.312622  -1.651459  0.880033  14.612729  1.751327  -0.222681  2.412340  -0.667490  -0.606659  -0.708432  
-1.691422  -0.325920  1.598296  1.751327  14.514414  -0.649662  1.720881  -1.565002  1.361987  1.263223  
-0.699587  -2.103017  -0.281893  -0.222681  -0.649662  17.243461  -1.776488  1.957770  -0.125884  -2.104005  
-1.903895  2.361249  -1.147319  2.412340  1.720881  -1.776488  26.038579  0.154378  0.174435  2.256917  
0.390628  -0.388894  1.463406  -0.667490  -1.565002  1.957770  0.154378  13.594301  1.035809  -2.650360  
2.551772  1.817071  0.238511  -0.606659  1.361987  -0.125884  0.174435  1.035809  28.478046  -1.767517  
1.563271  1.096735  1.592858  -0.708432  1.263223  -2.104005  2.256917  -2.650360  -1.767517  28.131740  

A-L*Lt = 
0.000000  0.112335  -1.702597  0.312622  -1.691422  -0.699587  -1.903895  0.390628  2.551772  1.563271  
0.000000  0.000000  0.030065  -1.651459  -0.325920  -2.103017  2.361249  -0.388894  1.817071  1.096735  
0.000000  0.000000  0.000000  0.880033  1.598296  -0.281893  -1.147319  1.463406  0.238511  1.592858  
0.000000  0.000000  0.000000  0.000000  1.751327  -0.222681  2.412340  -0.667490  -0.606659  -0.708432  
0.000000  0.000000  0.000000  0.000000  0.000000  -0.649662  1.720881  -1.565002  1.361987  1.263223  
0.000000  0.000000  0.000000  0.000000  0.000000  -0.000000  -1.776488  1.957770  -0.125884  -2.104005  
0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.154378  0.174435  2.256917  
0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  1.035809  -2.650360  
0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  -1.767517  
0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
    1  Col  Lower     10     10       0.01000         0.025  1.967378e-01

1 cases: 1 passed, 0 skipped, 0 failed

Program exited normally.


(gdb) r
Starting program: /home/camm/atlas3-3.6.0/bin/Linux_base_shared/xduumtst -n 10
NREPS  ORD   UPLO      N    lda          TIME        MFLOPS         RESID
=====  ===  =====  =====  =====  ============  ============  ============

A = 
4.825387  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.203878  4.099503  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.327901  -0.083287  5.514335  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
0.220534  -0.464147  0.251006  3.758231  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.459016  -0.180799  0.470268  0.419886  3.756300  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.183674  -0.443240  -0.111670  -0.006660  -0.058468  4.091890  0.000000  0.000000  0.000000  0.000000  
-0.407398  0.444840  -0.260024  0.490898  0.323276  -0.325004  5.083739  0.000000  0.000000  0.000000  
0.125145  -0.094088  0.439279  -0.195726  -0.408332  0.484317  0.097912  3.649379  0.000000  0.000000  
0.497551  0.354104  0.063572  -0.122261  0.270623  -0.048456  0.059375  0.163214  5.326067  0.000000  
0.294738  0.206778  0.300316  -0.133567  0.238167  -0.396687  0.425517  -0.499697  -0.333246  5.303936  

Ag = 
4.825387  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.203878  4.099503  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.327901  -0.083287  5.514335  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
0.220534  -0.464147  0.251006  3.758231  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.459016  -0.180799  0.470268  0.419886  3.756300  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.183674  -0.443240  -0.111670  -0.006660  -0.058468  4.091890  0.000000  0.000000  0.000000  0.000000  
-0.407398  0.444840  -0.260024  0.490898  0.323276  -0.325004  5.083739  0.000000  0.000000  0.000000  
0.125145  -0.094088  0.439279  -0.195726  -0.408332  0.484317  0.097912  3.649379  0.000000  0.000000  
0.497551  0.354104  0.063572  -0.122261  0.270623  -0.048456  0.059375  0.163214  5.326067  0.000000  
0.294738  0.206778  0.300316  -0.133567  0.238167  -0.396687  0.425517  -0.499697  -0.333246  5.303936  

A = 
24.242575  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.702320  17.632324  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-1.667100  -0.683692  31.059330  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
0.312622  -1.651459  0.880033  14.612729  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
-1.691422  -0.325920  1.598296  1.751327  14.514414  0.000000  0.000000  0.000000  0.000000  0.000000  
-0.699587  -2.103017  -0.281893  -0.222681  -0.649662  17.171570  0.000000  0.000000  0.000000  0.000000  
-1.903895  2.361249  -1.147319  2.412340  1.720881  -1.776488  26.038579  0.000000  0.000000  0.000000  
0.390628  -0.388894  1.463406  -0.667490  -1.565002  1.957770  0.154378  13.594301  0.000000  0.000000  
2.551772  1.817071  0.238511  -0.606659  1.361987  -0.125884  0.174435  1.035809  28.478046  0.000000  
1.563271  1.096735  1.592858  -0.708432  1.263223  -2.104005  2.256917  -2.650360  -1.767517  28.131740  

Ag = 
24.242575  0.112335  -1.702597  0.312622  -1.691422  -0.699587  -1.903895  0.390628  2.551772  1.563271  
-0.702320  17.632324  0.030065  -1.651459  -0.325920  -2.103017  2.361249  -0.388894  1.817071  1.096735  
-1.667100  -0.683692  31.059330  0.880033  1.598296  -0.281893  -1.147319  1.463406  0.238511  1.592858  
0.312622  -1.651459  0.880033  14.612729  1.751327  -0.222681  2.412340  -0.667490  -0.606659  -0.708432  
-1.691422  -0.325920  1.598296  1.751327  14.514414  -0.649662  1.720881  -1.565002  1.361987  1.263223  
-0.699587  -2.103017  -0.281893  -0.222681  -0.649662  17.243461  -1.776488  1.957770  -0.125884  -2.104005  
-1.903895  2.361249  -1.147319  2.412340  1.720881  -1.776488  26.038579  0.154378  0.174435  2.256917  
0.390628  -0.388894  1.463406  -0.667490  -1.565002  1.957770  0.154378  13.594301  1.035809  -2.650360  
2.551772  1.817071  0.238511  -0.606659  1.361987  -0.125884  0.174435  1.035809  28.478046  -1.767517  
1.563271  1.096735  1.592858  -0.708432  1.263223  -2.104005  2.256917  -2.650360  -1.767517  28.131740  

A-L*Lt = 
0.000000  0.112335  -1.702597  0.312622  -1.691422  -0.699587  -1.903895  0.390628  2.551772  1.563271  
0.000000  0.000000  0.030065  -1.651459  -0.325920  -2.103017  2.361249  -0.388894  1.817071  1.096735  
0.000000  0.000000  0.000000  0.880033  1.598296  -0.281893  -1.147319  1.463406  0.238511  1.592858  
0.000000  0.000000  0.000000  0.000000  1.751327  -0.222681  2.412340  -0.667490  -0.606659  -0.708432  
0.000000  0.000000  0.000000  0.000000  0.000000  -0.649662  1.720881  -1.565002  1.361987  1.263223  
0.000000  0.000000  0.000000  0.000000  0.000000  0.071891  -1.776488  1.957770  -0.125884  -2.104005  
0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.154378  0.174435  2.256917  
0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  1.035809  -2.650360  
0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  -1.767517  
0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  

Breakpoint 1, uumtest (Order=CblasColMajor, Uplo=CblasLower, 
    CacheSize=524288, N=10, lda=10, tim=0xeffffc04) at ../uumtst.c:370
370	      fprintf(stderr, "normA=%e, eps=%e, num=%e\n", normA, eps, resid);
(gdb) c
Continuing.
normA=8.132650e+00, eps=2.220446e-16, num=3.981099e+12
    1  Col  Lower     10     10      -0.00000         0.000  3.981099e+12

1 cases: 0 passed, 0 skipped, 1 failed

Program exited normally.
=============================================================================

The difference between the runs lies here and only here, at random
invocations of the program:

--- /tmp/g1	2004-10-15 18:53:51.000000000 +0000
+++ /tmp/g2	2004-10-15 18:54:06.000000000 +0000
@@ -28,7 +28,7 @@
 -1.667100  -0.683692  31.059330  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
 0.312622  -1.651459  0.880033  14.612729  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  
 -1.691422  -0.325920  1.598296  1.751327  14.514414  0.000000  0.000000  0.000000  0.000000  0.000000  
--0.699587  -2.103017  -0.281893  -0.222681  -0.649662  17.243461  0.000000  0.000000  0.000000  0.000000  
+-0.699587  -2.103017  -0.281893  -0.222681  -0.649662  17.171570  0.000000  0.000000  0.000000  0.000000  
 -1.903895  2.361249  -1.147319  2.412340  1.720881  -1.776488  26.038579  0.000000  0.000000  0.000000  
 0.390628  -0.388894  1.463406  -0.667490  -1.565002  1.957770  0.154378  13.594301  0.000000  0.000000  
 2.551772  1.817071  0.238511  -0.606659  1.361987  -0.125884  0.174435  1.035809  28.478046  0.000000  
@@ -52,7 +52,7 @@
 0.000000  0.000000  0.000000  0.880033  1.598296  -0.281893  -1.147319  1.463406  0.238511  1.592858  
 0.000000  0.000000  0.000000  0.000000  1.751327  -0.222681  2.412340  -0.667490  -0.606659  -0.708432  
 0.000000  0.000000  0.000000  0.000000  0.000000  -0.649662  1.720881  -1.565002  1.361987  1.263223  
-0.000000  0.000000  0.000000  0.000000  0.000000  -0.000000  -1.776488  1.957770  -0.125884  -2.104005  
+0.000000  0.000000  0.000000  0.000000  0.000000  0.071891  -1.776488  1.957770  -0.125884  -2.104005  
 0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.154378  0.174435  2.256917  
 0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  1.035809  -2.650360  
 0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  -1.767517  


Ideas?

Take care,
-- 
Camm Maguire			     			camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah



Reply to: