Bug#921207: Octave GEMM error on large matrix due to openmp thread race condition
Package: octave
Version: 4.4.1-4
Severity: grave
X-Debbugs-CC: debian-science@lists.debian.org, idohlp@neto.net.il
Dear octave maintainer,
I received an astonishing bug report[1] saying that MKL returns wrong
result for matrix multiplication. However, my further investigation
suggests that the problem is very likely a threading bug of octave.
OpenMP is highly suspected according to my following experimental
results.
=======================================================================
Please reproduce the problem with the following octave script:
921193.m
```
x=1:100000; y=[x;x]*[x;x]';
disp(reshape(y, 1, 4))
```
The correct result is
   333338333350000   333338333350000   333338333350000   333338333350000
However sometimes octave yields random results, which is possibly
a symptom of race condition between threads or alike.
------------------------ MKL ------------------------------------------
libblas.so, libblas.so.3 = libmkl_rt.so
$ octave 921193.m
   1.1033e+15   1.1033e+15   1.1038e+15   1.1038e+15
$ MKL_NUM_THREADS=1 octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000
$ MKL_NUM_THREADS=2 octave 921193.m 
   642428448891136   642428448891136   642428448891136   642428448891136
$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=intel octave 921193.m 
   489859913436624   489859913436624   488279025495504   488279025495504
$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=gnu octave 921193.m 
   333338333350000   333338333350000   333338333350000   333338333350000
$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=tbb octave 921193.m 
   333338333350000   333338333350000   333338333350000   333338333350000
$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=sequential octave 921193.m 
   333338333350000   333338333350000   333338333350000   333338333350000
The default threading model used by libmkl_rt is the "intel" threading
aka. libiomp5 . It seems that Octave isn't happy with libiomp5 .
In contrast, gomp (gnu), tbb, serial/sequential threading users
are not affected.
----------------------- Netlib ------------------------------------------
libblas.so, libblas.so.3 = netlib blas
$ octave 921193.m
   824104476280848   824104476280848   828286951663952   828286951663952
$ octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000
$ OMP_NUM_THREADS=1 octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000
Netlib blas has no multi-thread implementation. This result indicates
that octave's multi-threading functionality is questionable.
---------------------- BLIS (openmp) ------------------------------------
libblas.so, libblas.so.3 = blis (openmp). Please note that BLIS use
single thread by default even if it's compiled with openmp/pthread
threading model.
$ octave 921193.m
   757875851200128   757875851200128   796692912410048   796692912410048
$ BLIS_NUM_THREADS=1 octave 921193.m
   531523819460688   531523819460688   543945552290544   543945552290544
$ OMP_NUM_THREADS=1 octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000
Again, octave's multi-threading functionality is questionable.
------------------- OpenBLAS ---------------------------------------
libblas.so, libblas.so.3 = openblas
$ octave 921193.m
   907773323384928   907773323384928   925793789579664   925793789579664
$ octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000
$ OPENBLAS_NUM_THREADS=1 octave 921193.m
   737565604371072   737565604371072   741382086552384   741382086552384
$ OMP_NUM_THREADS=1 octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000
=========================================================================
According to the above experimental results, I think BLAS libraries
are innocent.
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=921193
Reply to: