[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#921207: Octave GEMM error on large matrix due to openmp thread race condition



Package: octave
Version: 4.4.1-4
Severity: grave
X-Debbugs-CC: debian-science@lists.debian.org, idohlp@neto.net.il

Dear octave maintainer,

I received an astonishing bug report[1] saying that MKL returns wrong
result for matrix multiplication. However, my further investigation
suggests that the problem is very likely a threading bug of octave.
OpenMP is highly suspected according to my following experimental
results.

=======================================================================

Please reproduce the problem with the following octave script:

921193.m

```
x=1:100000; y=[x;x]*[x;x]';
disp(reshape(y, 1, 4))
```

The correct result is
   333338333350000   333338333350000   333338333350000   333338333350000

However sometimes octave yields random results, which is possibly
a symptom of race condition between threads or alike.

------------------------ MKL ------------------------------------------

libblas.so, libblas.so.3 = libmkl_rt.so

$ octave 921193.m
   1.1033e+15   1.1033e+15   1.1038e+15   1.1038e+15

$ MKL_NUM_THREADS=1 octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000

$ MKL_NUM_THREADS=2 octave 921193.m 
   642428448891136   642428448891136   642428448891136   642428448891136

$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=intel octave 921193.m 
   489859913436624   489859913436624   488279025495504   488279025495504

$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=gnu octave 921193.m 
   333338333350000   333338333350000   333338333350000   333338333350000

$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=tbb octave 921193.m 
   333338333350000   333338333350000   333338333350000   333338333350000

$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=sequential octave 921193.m 
   333338333350000   333338333350000   333338333350000   333338333350000

The default threading model used by libmkl_rt is the "intel" threading
aka. libiomp5 . It seems that Octave isn't happy with libiomp5 .
In contrast, gomp (gnu), tbb, serial/sequential threading users
are not affected.

----------------------- Netlib ------------------------------------------

libblas.so, libblas.so.3 = netlib blas

$ octave 921193.m
   824104476280848   824104476280848   828286951663952   828286951663952
$ octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000
$ OMP_NUM_THREADS=1 octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000

Netlib blas has no multi-thread implementation. This result indicates
that octave's multi-threading functionality is questionable.

---------------------- BLIS (openmp) ------------------------------------

libblas.so, libblas.so.3 = blis (openmp). Please note that BLIS use
single thread by default even if it's compiled with openmp/pthread
threading model.

$ octave 921193.m
   757875851200128   757875851200128   796692912410048   796692912410048

$ BLIS_NUM_THREADS=1 octave 921193.m
   531523819460688   531523819460688   543945552290544   543945552290544

$ OMP_NUM_THREADS=1 octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000

Again, octave's multi-threading functionality is questionable.

------------------- OpenBLAS ---------------------------------------

libblas.so, libblas.so.3 = openblas

$ octave 921193.m
   907773323384928   907773323384928   925793789579664   925793789579664

$ octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000

$ OPENBLAS_NUM_THREADS=1 octave 921193.m
   737565604371072   737565604371072   741382086552384   741382086552384

$ OMP_NUM_THREADS=1 octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000

=========================================================================

According to the above experimental results, I think BLAS libraries
are innocent.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=921193


Reply to: