[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: OpenBlas Pthread issue with R



Hi Mo,

On 28 May 2020 at 03:40, Mo Zhou wrote:
| Hi edd,

It's Dirk, actually. 'edd' is just the account handle.
 
| I still cannot reproduce the issue in a Sid chroot with R-4.0.0. It

"We know". Seb was referring to the same problem, as I understand it.

| seems that the trigger of the problem remains to be unveiled.
| 
| According to the previous discussions, there are several possible
| preliminary solutions to the problem
| 
| 1) We provide extra shared objects, libopenblasp, libopenblaso, etc
|    and let R link against libopenblaso (o=openmp)
| 
|    Drawback: too ugly and makes the blas ecosystem overly complex.
|    Also breaks the alternative mechanism
| 
| 2) Add RPATH=/usr/lib/<..>/blas-pthread/ to all the corresponding
|    ELFs of R.
|    
|    Drawback: ELFs of external packages might be linked against
|    libblas.so.3 without RPATH.
| 
| 3) ...

Correct me if I am wrong, and I saying that with a lot of _genuine_
appreciation for all your work refactoring LAPACK/BLAS in Debian (and
bringing DL tools to Debian) but I think your analysis is wrong.

What we have is standard behaviour in the scheme designed by Camm in the late
1990s.  Debian has multiple packages which can be swapped as they all provide
libblas.so and liblapack.so (I once wrote a whole package / unpublished paper
on this for benchmarking.)

What we seem to have now is that

  Intel OpenBLAS and GNU pthread seem to block

  Intel OpenBLAS and OpenMPI do not

There is hopefully a programmatic fix somewhere. If not, reordering may work
(but may potentially expose other side effects....)
 
| The ordering of the blas providers is more likely a historical result.

Not result. Choice. Derived over time.

| If we default to openmp version, programs compiled by llvm will
| rage. (libgomp.so + libiomp.so confliction)

I was fearful of that with my earlier comment/hint on other side effects.

But not to take this away from the key aspect:

 - on some platforms (and my i7-8700k fits) R is _unuseable_

That is a grave issue. Fixing this should maybe take precedence over
designing new schemes that are yet to be tested.

Best,  Dirk

 
| On Wed, May 27, 2020 at 11:09:57AM -0500, Dirk Eddelbuettel wrote:
| > 
| > Hi Seb,
| > 
| > On 1 May 2020 at 14:18, Sébastien Villemot wrote:
| > | Le vendredi 01 mai 2020 à 07:05 -0500, Dirk Eddelbuettel a écrit :
| > | > On 1 May 2020 at 05:16, Mo Zhou wrote:
| > | > > On Thu, Apr 30, 2020 at 11:26:23PM -0500, Dirk Eddelbuettel wrote:
| > | > > > Switching to libopenblas0-openmp works but one needs to uninstall
| > | > > > libopenblas0-pthread (or else fiddle with the alternatives priority).
| > | > > 
| > | > > Does that mean the update-alternatives mechanism is malfunctioning?
| > | > 
| > | > I do not know.
| > | > 
| > | > It could just be the default ranking is wrong. I did not check, and was
| > | > interested in fiddling manually (as I find it always bites me years later...)
| > | > 
| > | > Given that openblas-pthread renders R _unuseable_ and that Seb said it was a
| > | > known issue, maybe we should ensure it ranks lower than it currently does?
| > | 
| > | I did not say that it is a known issue. I said that we had a similar
| > | issue in the past, but that it was solved.
| > | 
| > | Also, I tried to reproduce the problem on an unstable system with
| > | OpenBLAS/pthreads selected as the alternative, but I couldn’t. The
| > | computation goes fine for me. So I guess the problem manifests only on
| > | specific systems or CPUs.
| > | 
| > | I suggest that you open a bug report, providing as many details as
| > | possible on the system where it manifests (in particular, the precise
| > | CPU model).
| > | 
| > | Also, I don’t think we should change the alternatives priorities. The
| > | OpenMP flavour of OpenBLAS has its own problems (in particular, because
| > | it is compiled with GNU OpenMP, it is incompatible with applications
| > | compiled against Intel’s OpenMP, and we cannot do anything about that).
| > | So we should rather fix the bug that you encountered.
| > 
| > For what it is worth, on my (personal) workstation I re-encountered the bug
| > yesterday, chiefly because "it is still there" and the default ordering does
| > not help me -- quite the contrary, it "causes it".
| > 
| > All I did was to go from Ubuntu 19.10 to 20.04; I have a i7-8700k cpu and the
| > simple test of `example(solve)` reliably hangs R. Rememy:
| > 
| >   sudo apt install libopenblas-openmp-dev
| >   sudo apt remove libopenblas0-pthread
| > 
| > After that things are fine again. It's a tricky issue for newb. It would be
| > really good if we could squash / circumvent it.
| > 
| > I would be happy to test newer/different version if you have them. I could
| > drop any new Debian sources onto my PPA to test them on this box.
| > 
| > Dirk
| > 
| > -- 
| > http://dirk.eddelbuettel.com | @eddelbuettel | edd@debian.org

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | edd@debian.org


Reply to: