Bug#921207: Octave GEMM error on large matrix due to openmp thread race condition
Package: octave
Version: 4.4.1-4
Severity: grave
X-Debbugs-CC: debian-science@lists.debian.org, idohlp@neto.net.il
Dear octave maintainer,
I received an astonishing bug report[1] saying that MKL returns wrong
result for matrix multiplication. However, my further investigation
suggests that the problem is very likely a threading bug of octave.
OpenMP is highly suspected according to my following experimental
results.
=======================================================================
Please reproduce the problem with the following octave script:
921193.m
```
x=1:100000; y=[x;x]*[x;x]';
disp(reshape(y, 1, 4))
```
The correct result is
333338333350000 333338333350000 333338333350000 333338333350000
However sometimes octave yields random results, which is possibly
a symptom of race condition between threads or alike.
------------------------ MKL ------------------------------------------
libblas.so, libblas.so.3 = libmkl_rt.so
$ octave 921193.m
1.1033e+15 1.1033e+15 1.1038e+15 1.1038e+15
$ MKL_NUM_THREADS=1 octave 921193.m
333338333350000 333338333350000 333338333350000 333338333350000
$ MKL_NUM_THREADS=2 octave 921193.m
642428448891136 642428448891136 642428448891136 642428448891136
$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=intel octave 921193.m
489859913436624 489859913436624 488279025495504 488279025495504
$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=gnu octave 921193.m
333338333350000 333338333350000 333338333350000 333338333350000
$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=tbb octave 921193.m
333338333350000 333338333350000 333338333350000 333338333350000
$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=sequential octave 921193.m
333338333350000 333338333350000 333338333350000 333338333350000
The default threading model used by libmkl_rt is the "intel" threading
aka. libiomp5 . It seems that Octave isn't happy with libiomp5 .
In contrast, gomp (gnu), tbb, serial/sequential threading users
are not affected.
----------------------- Netlib ------------------------------------------
libblas.so, libblas.so.3 = netlib blas
$ octave 921193.m
824104476280848 824104476280848 828286951663952 828286951663952
$ octave 921193.m
333338333350000 333338333350000 333338333350000 333338333350000
$ OMP_NUM_THREADS=1 octave 921193.m
333338333350000 333338333350000 333338333350000 333338333350000
Netlib blas has no multi-thread implementation. This result indicates
that octave's multi-threading functionality is questionable.
---------------------- BLIS (openmp) ------------------------------------
libblas.so, libblas.so.3 = blis (openmp). Please note that BLIS use
single thread by default even if it's compiled with openmp/pthread
threading model.
$ octave 921193.m
757875851200128 757875851200128 796692912410048 796692912410048
$ BLIS_NUM_THREADS=1 octave 921193.m
531523819460688 531523819460688 543945552290544 543945552290544
$ OMP_NUM_THREADS=1 octave 921193.m
333338333350000 333338333350000 333338333350000 333338333350000
Again, octave's multi-threading functionality is questionable.
------------------- OpenBLAS ---------------------------------------
libblas.so, libblas.so.3 = openblas
$ octave 921193.m
907773323384928 907773323384928 925793789579664 925793789579664
$ octave 921193.m
333338333350000 333338333350000 333338333350000 333338333350000
$ OPENBLAS_NUM_THREADS=1 octave 921193.m
737565604371072 737565604371072 741382086552384 741382086552384
$ OMP_NUM_THREADS=1 octave 921193.m
333338333350000 333338333350000 333338333350000 333338333350000
=========================================================================
According to the above experimental results, I think BLAS libraries
are innocent.
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=921193
Reply to: