[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: MPI debugging workflows




On 04/08/2018 22:41, Dima Kogan wrote:
Answering some of my own questions inline...


Dima Kogan <dima@secretsauce.net> writes:

2. Is the MPI implementation significant? Would mpich behave
potentially differently here from openmpi?
I rebuilt sundials with mpich instead of openmpi, and those tests now
work just fine: nothing locks up. There might be other issues involved
here, but I'd like to fully figure out what's going on before proposing
any such change.

I also poked around with a debugger a little bit: the lockup is inside
some MPI calls. Anybody have experience here? I can imagine this is an
openmpi bug, but the last time I touched MPI anything was about 20 years
ago, so any info from more-recently-experienced people would be welcome.

There have been a number of lockup issues with openmpi recently, including thread lockups in underlying pmix libs.

There is a new 3.1.2rc2 release that I'm testing. Can you give a reproducible test case for me to test against, as I think this release may have the necessary fix.

regards
Alastair

--
Alastair McKinstry, <alastair@sceal.ie>, <mckinstry@debian.org>, https://diaspora.sceal.ie/u/amckinstry
Misentropy: doubting that the Universe is becoming more disordered.


Reply to: