Re: MPICH as default MPI; WAS: MPI debugging workflows
On 07/12/2018 11:26, Drew Parsons wrote:
Hi Alistair, openmpi3 seems to be stabilised now, packages are now
passing tests and libpsm2 is no longer injecting 15 sec delays.
Nice that the mpich 3.3 release is now finalised. Do we feel
confident proceeding with the switch of mpi-defaults from openmpi to
mpich?
Are there any know issues with the transition? One that catches my
eye are the build failures in scalapack. It's been tuned to pass
built time tests with openmpi but fails many tests with mpich
(scalapack builds packages for both mpi implementations). I'm not sure
how concerned we should be with those build failures. Perhaps upstream
should be consulted on it. Are similar mpich failures expected in
other packages? Is there a simple way of setting up a buildd to do a
test run of the transition before making it official?
Drew
Hi Drew,
Looking into it further, I'm reluctant now to move to mpich for buster
as the default. One was the experience of the openmpi3 transition,
shaking out many issues.
I suspect we could see the same with other package builds, as you point
out, tuned to openmpi rather than mpich, but also the feature support
for mpich.
e.g. mpich integration with psm / pmix / slurm is weak (in Debian).
While it might not look important to be able to scale to 10k+ nodes on
Debian (as none of the top500 machines run Debian), we're seeing an
increase in the container use case: building mpi apps within Singularity
containers running on our main machine. We don't run Debian as the OS on
the base supercomputer at work because we need kernel support from
$vendor, but the apps are built in Singularity containers running Debian
... v. large scale jobs becom increasingly likely, and openmpi / pmix is
needed for that. Testing mpich I've yet to get CH4 working reliably -
needed for pmix, and the OFI / UFX support is labeled 'experimental'.
My driving use case for the move to mpich had been fault tolerance -
needed for co-arrays (https://tracker.debian.org/pkg/open-coarrays)
needed for Fortran 2018, but i've since re-done open-coarrays to build
both openmpi and mpich variants, so that issue went away.
So I think more testing of mpich3 builds with CH4 /pmix / OFI support is
needed, but moving over openmpi-> mpich at this stage is iffy.
regards
Alastair
--
Alastair McKinstry, <alastair@sceal.ie>, <mckinstry@debian.org>, https://diaspora.sceal.ie/u/amckinstry
Misentropy: doubting that the Universe is becoming more disordered.
Reply to: