[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

MPICH as default MPI; WAS: MPI debugging workflows




On 31/08/2018 11:04, Drew Parsons wrote:
On 2018-08-30 14:18, Alastair McKinstry wrote:
On 30/08/2018 09:39, Drew Parsons wrote:

If you want a break from the openmpi angst then go ahead and drop mpich 3.3b3 into unstable.  It won't make the overall MPI situation any worse... :)

Drew

Ok, I've pushed 3.3b3 to unstable.

Great!

For me there are two concerns:

(1) The current setup (openmpi default) shakes out issues in openmpi3
that should be fixed. It would be good to get that done.

That's fair.  If we're going to "drop" openmpi, it's a good policy to leave it in as stable a state as possible.


At this stage it appears there is a remaining "hang" / threading issue thats affecting 32-bit platforms

(See #907267). Once thats fixed, I'm favouring no further updates before Buster - ie ship openmpi 3.1.2 with pmix 3.0.1

(openmpi now has a dependency on  libpmix, the Process Management Interface for exascale, that handles the launching of processes (up to millions, hierarchically).

the openmpi /pmix interface has been flaky, I suspect, and not well tested on non-traditional HPC architectures (eg. I suspect its the source of the 32-bit issue).

mpich _can_ be built with pmix but I'm recommending not doing so for Buster.


(2) moving to mpich as default is a transition and should be pushed
before the deadline - say setting 30 Sept?

This is probably a good point to confer with the Release Team, so I'm cc:ing them.

Release Team: we have nearly completed the openmpi3 transition. But there is a broader question of switching mpi-defaults to mpich instead of openmpi.  mpich is reported to be more stable than openmpi and is recommended by several upstream authors of the HPC software libraries.  We have some consensus that switching to mpich is probably a good idea, it's just a question of timing at this point.


Does an MPI / mpich transition overlap with other transitions planned
for Buster -  say hwloc, hdf5 ?

hdf5 already builds against both openmpi and mpich, so it should not be a particular problem. It has had more build failures on the minor arches (with the new hdf5 version in experimental), but there's no reason to blame mpich for that.

I don't know about hwloc, but the builds in experimental look clean.

Drew

--
Alastair McKinstry, <alastair@sceal.ie>, <mckinstry@debian.org>, https://diaspora.sceal.ie/u/amckinstry
Misentropy: doubting that the Universe is becoming more disordered.


Reply to: