Random corruption in atlas computations
Greetings! The atlas testers fail at random intervals on m68k,
preventing the package build to complete. I'm suspecting some cache
or register flushing issue. When I try to stop in gdb at the point
the corrupted value is supposedly written, the problem never appears.
Here are the symptoms:
=============================================================================
gdb xduumtst
(gdb) r
Starting program: /home/camm/atlas3-3.6.0/bin/Linux_base_shared/xduumtst -n 10
NREPS ORD UPLO N lda TIME MFLOPS RESID
===== === ===== ===== ===== ============ ============ ============
A =
4.825387 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-0.203878 4.099503 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-0.327901 -0.083287 5.514335 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.220534 -0.464147 0.251006 3.758231 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-0.459016 -0.180799 0.470268 0.419886 3.756300 0.000000 0.000000 0.000000 0.000000 0.000000
-0.183674 -0.443240 -0.111670 -0.006660 -0.058468 4.091890 0.000000 0.000000 0.000000 0.000000
-0.407398 0.444840 -0.260024 0.490898 0.323276 -0.325004 5.083739 0.000000 0.000000 0.000000
0.125145 -0.094088 0.439279 -0.195726 -0.408332 0.484317 0.097912 3.649379 0.000000 0.000000
0.497551 0.354104 0.063572 -0.122261 0.270623 -0.048456 0.059375 0.163214 5.326067 0.000000
0.294738 0.206778 0.300316 -0.133567 0.238167 -0.396687 0.425517 -0.499697 -0.333246 5.303936
Ag =
4.825387 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-0.203878 4.099503 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-0.327901 -0.083287 5.514335 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.220534 -0.464147 0.251006 3.758231 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-0.459016 -0.180799 0.470268 0.419886 3.756300 0.000000 0.000000 0.000000 0.000000 0.000000
-0.183674 -0.443240 -0.111670 -0.006660 -0.058468 4.091890 0.000000 0.000000 0.000000 0.000000
-0.407398 0.444840 -0.260024 0.490898 0.323276 -0.325004 5.083739 0.000000 0.000000 0.000000
0.125145 -0.094088 0.439279 -0.195726 -0.408332 0.484317 0.097912 3.649379 0.000000 0.000000
0.497551 0.354104 0.063572 -0.122261 0.270623 -0.048456 0.059375 0.163214 5.326067 0.000000
0.294738 0.206778 0.300316 -0.133567 0.238167 -0.396687 0.425517 -0.499697 -0.333246 5.303936
A =
24.242575 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-0.702320 17.632324 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-1.667100 -0.683692 31.059330 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.312622 -1.651459 0.880033 14.612729 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-1.691422 -0.325920 1.598296 1.751327 14.514414 0.000000 0.000000 0.000000 0.000000 0.000000
-0.699587 -2.103017 -0.281893 -0.222681 -0.649662 17.243461 0.000000 0.000000 0.000000 0.000000
-1.903895 2.361249 -1.147319 2.412340 1.720881 -1.776488 26.038579 0.000000 0.000000 0.000000
0.390628 -0.388894 1.463406 -0.667490 -1.565002 1.957770 0.154378 13.594301 0.000000 0.000000
2.551772 1.817071 0.238511 -0.606659 1.361987 -0.125884 0.174435 1.035809 28.478046 0.000000
1.563271 1.096735 1.592858 -0.708432 1.263223 -2.104005 2.256917 -2.650360 -1.767517 28.131740
Ag =
24.242575 0.112335 -1.702597 0.312622 -1.691422 -0.699587 -1.903895 0.390628 2.551772 1.563271
-0.702320 17.632324 0.030065 -1.651459 -0.325920 -2.103017 2.361249 -0.388894 1.817071 1.096735
-1.667100 -0.683692 31.059330 0.880033 1.598296 -0.281893 -1.147319 1.463406 0.238511 1.592858
0.312622 -1.651459 0.880033 14.612729 1.751327 -0.222681 2.412340 -0.667490 -0.606659 -0.708432
-1.691422 -0.325920 1.598296 1.751327 14.514414 -0.649662 1.720881 -1.565002 1.361987 1.263223
-0.699587 -2.103017 -0.281893 -0.222681 -0.649662 17.243461 -1.776488 1.957770 -0.125884 -2.104005
-1.903895 2.361249 -1.147319 2.412340 1.720881 -1.776488 26.038579 0.154378 0.174435 2.256917
0.390628 -0.388894 1.463406 -0.667490 -1.565002 1.957770 0.154378 13.594301 1.035809 -2.650360
2.551772 1.817071 0.238511 -0.606659 1.361987 -0.125884 0.174435 1.035809 28.478046 -1.767517
1.563271 1.096735 1.592858 -0.708432 1.263223 -2.104005 2.256917 -2.650360 -1.767517 28.131740
A-L*Lt =
0.000000 0.112335 -1.702597 0.312622 -1.691422 -0.699587 -1.903895 0.390628 2.551772 1.563271
0.000000 0.000000 0.030065 -1.651459 -0.325920 -2.103017 2.361249 -0.388894 1.817071 1.096735
0.000000 0.000000 0.000000 0.880033 1.598296 -0.281893 -1.147319 1.463406 0.238511 1.592858
0.000000 0.000000 0.000000 0.000000 1.751327 -0.222681 2.412340 -0.667490 -0.606659 -0.708432
0.000000 0.000000 0.000000 0.000000 0.000000 -0.649662 1.720881 -1.565002 1.361987 1.263223
0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 -1.776488 1.957770 -0.125884 -2.104005
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.154378 0.174435 2.256917
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.035809 -2.650360
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -1.767517
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
1 Col Lower 10 10 0.01000 0.025 1.967378e-01
1 cases: 1 passed, 0 skipped, 0 failed
Program exited normally.
(gdb) r
Starting program: /home/camm/atlas3-3.6.0/bin/Linux_base_shared/xduumtst -n 10
NREPS ORD UPLO N lda TIME MFLOPS RESID
===== === ===== ===== ===== ============ ============ ============
A =
4.825387 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-0.203878 4.099503 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-0.327901 -0.083287 5.514335 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.220534 -0.464147 0.251006 3.758231 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-0.459016 -0.180799 0.470268 0.419886 3.756300 0.000000 0.000000 0.000000 0.000000 0.000000
-0.183674 -0.443240 -0.111670 -0.006660 -0.058468 4.091890 0.000000 0.000000 0.000000 0.000000
-0.407398 0.444840 -0.260024 0.490898 0.323276 -0.325004 5.083739 0.000000 0.000000 0.000000
0.125145 -0.094088 0.439279 -0.195726 -0.408332 0.484317 0.097912 3.649379 0.000000 0.000000
0.497551 0.354104 0.063572 -0.122261 0.270623 -0.048456 0.059375 0.163214 5.326067 0.000000
0.294738 0.206778 0.300316 -0.133567 0.238167 -0.396687 0.425517 -0.499697 -0.333246 5.303936
Ag =
4.825387 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-0.203878 4.099503 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-0.327901 -0.083287 5.514335 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.220534 -0.464147 0.251006 3.758231 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-0.459016 -0.180799 0.470268 0.419886 3.756300 0.000000 0.000000 0.000000 0.000000 0.000000
-0.183674 -0.443240 -0.111670 -0.006660 -0.058468 4.091890 0.000000 0.000000 0.000000 0.000000
-0.407398 0.444840 -0.260024 0.490898 0.323276 -0.325004 5.083739 0.000000 0.000000 0.000000
0.125145 -0.094088 0.439279 -0.195726 -0.408332 0.484317 0.097912 3.649379 0.000000 0.000000
0.497551 0.354104 0.063572 -0.122261 0.270623 -0.048456 0.059375 0.163214 5.326067 0.000000
0.294738 0.206778 0.300316 -0.133567 0.238167 -0.396687 0.425517 -0.499697 -0.333246 5.303936
A =
24.242575 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-0.702320 17.632324 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-1.667100 -0.683692 31.059330 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.312622 -1.651459 0.880033 14.612729 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-1.691422 -0.325920 1.598296 1.751327 14.514414 0.000000 0.000000 0.000000 0.000000 0.000000
-0.699587 -2.103017 -0.281893 -0.222681 -0.649662 17.171570 0.000000 0.000000 0.000000 0.000000
-1.903895 2.361249 -1.147319 2.412340 1.720881 -1.776488 26.038579 0.000000 0.000000 0.000000
0.390628 -0.388894 1.463406 -0.667490 -1.565002 1.957770 0.154378 13.594301 0.000000 0.000000
2.551772 1.817071 0.238511 -0.606659 1.361987 -0.125884 0.174435 1.035809 28.478046 0.000000
1.563271 1.096735 1.592858 -0.708432 1.263223 -2.104005 2.256917 -2.650360 -1.767517 28.131740
Ag =
24.242575 0.112335 -1.702597 0.312622 -1.691422 -0.699587 -1.903895 0.390628 2.551772 1.563271
-0.702320 17.632324 0.030065 -1.651459 -0.325920 -2.103017 2.361249 -0.388894 1.817071 1.096735
-1.667100 -0.683692 31.059330 0.880033 1.598296 -0.281893 -1.147319 1.463406 0.238511 1.592858
0.312622 -1.651459 0.880033 14.612729 1.751327 -0.222681 2.412340 -0.667490 -0.606659 -0.708432
-1.691422 -0.325920 1.598296 1.751327 14.514414 -0.649662 1.720881 -1.565002 1.361987 1.263223
-0.699587 -2.103017 -0.281893 -0.222681 -0.649662 17.243461 -1.776488 1.957770 -0.125884 -2.104005
-1.903895 2.361249 -1.147319 2.412340 1.720881 -1.776488 26.038579 0.154378 0.174435 2.256917
0.390628 -0.388894 1.463406 -0.667490 -1.565002 1.957770 0.154378 13.594301 1.035809 -2.650360
2.551772 1.817071 0.238511 -0.606659 1.361987 -0.125884 0.174435 1.035809 28.478046 -1.767517
1.563271 1.096735 1.592858 -0.708432 1.263223 -2.104005 2.256917 -2.650360 -1.767517 28.131740
A-L*Lt =
0.000000 0.112335 -1.702597 0.312622 -1.691422 -0.699587 -1.903895 0.390628 2.551772 1.563271
0.000000 0.000000 0.030065 -1.651459 -0.325920 -2.103017 2.361249 -0.388894 1.817071 1.096735
0.000000 0.000000 0.000000 0.880033 1.598296 -0.281893 -1.147319 1.463406 0.238511 1.592858
0.000000 0.000000 0.000000 0.000000 1.751327 -0.222681 2.412340 -0.667490 -0.606659 -0.708432
0.000000 0.000000 0.000000 0.000000 0.000000 -0.649662 1.720881 -1.565002 1.361987 1.263223
0.000000 0.000000 0.000000 0.000000 0.000000 0.071891 -1.776488 1.957770 -0.125884 -2.104005
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.154378 0.174435 2.256917
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.035809 -2.650360
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -1.767517
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Breakpoint 1, uumtest (Order=CblasColMajor, Uplo=CblasLower,
CacheSize=524288, N=10, lda=10, tim=0xeffffc04) at ../uumtst.c:370
370 fprintf(stderr, "normA=%e, eps=%e, num=%e\n", normA, eps, resid);
(gdb) c
Continuing.
normA=8.132650e+00, eps=2.220446e-16, num=3.981099e+12
1 Col Lower 10 10 -0.00000 0.000 3.981099e+12
1 cases: 0 passed, 0 skipped, 1 failed
Program exited normally.
=============================================================================
The difference between the runs lies here and only here, at random
invocations of the program:
--- /tmp/g1 2004-10-15 18:53:51.000000000 +0000
+++ /tmp/g2 2004-10-15 18:54:06.000000000 +0000
@@ -28,7 +28,7 @@
-1.667100 -0.683692 31.059330 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.312622 -1.651459 0.880033 14.612729 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-1.691422 -0.325920 1.598296 1.751327 14.514414 0.000000 0.000000 0.000000 0.000000 0.000000
--0.699587 -2.103017 -0.281893 -0.222681 -0.649662 17.243461 0.000000 0.000000 0.000000 0.000000
+-0.699587 -2.103017 -0.281893 -0.222681 -0.649662 17.171570 0.000000 0.000000 0.000000 0.000000
-1.903895 2.361249 -1.147319 2.412340 1.720881 -1.776488 26.038579 0.000000 0.000000 0.000000
0.390628 -0.388894 1.463406 -0.667490 -1.565002 1.957770 0.154378 13.594301 0.000000 0.000000
2.551772 1.817071 0.238511 -0.606659 1.361987 -0.125884 0.174435 1.035809 28.478046 0.000000
@@ -52,7 +52,7 @@
0.000000 0.000000 0.000000 0.880033 1.598296 -0.281893 -1.147319 1.463406 0.238511 1.592858
0.000000 0.000000 0.000000 0.000000 1.751327 -0.222681 2.412340 -0.667490 -0.606659 -0.708432
0.000000 0.000000 0.000000 0.000000 0.000000 -0.649662 1.720881 -1.565002 1.361987 1.263223
-0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 -1.776488 1.957770 -0.125884 -2.104005
+0.000000 0.000000 0.000000 0.000000 0.000000 0.071891 -1.776488 1.957770 -0.125884 -2.104005
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.154378 0.174435 2.256917
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.035809 -2.650360
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -1.767517
Ideas?
Take care,
--
Camm Maguire camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah
Reply to: