On 2023-11-23 12:13, Emilio Pozuelo Monfort wrote:
Hi,
On 23/11/2023 09:36, Alastair McKinstry wrote:
Hi,
OpenMPI has a new upstream release 5.0.0. It is in experimental now; 
the SOVERSION for libraries remains 40.X (minor version increment), 
there is  an SOVERSION increment for private libraries only so in 
theory this is not an ABI transition. However 5.0.0 drops 32-bit 
system support.
The default MPI implementation for each architecture is set in 
mpi-defaults; this allows a per-arch MPI choice; in practice we 
currently use OpenMPI for all archs. The other choice is MPICH.
So the question becomes: do we switch MPI for just 32-bit archs, or 
all? What are the release teams opinion on this transition?
Having the same implementation across the board makes things easier
for testing purposes et al, however I don't see that as a blocker for
not having separate implementations.
True, in one sense it's simpler to have the same default MPI.  But 
we've set up our package infrastructure so that in principle it should 
not matter.  One architecture does not (or should not) depend on 
another, so it shouldn't break packages just because we'd have 
different MPI implementations on different architectures.  On the 
contrary, "actively" using both implementations could lead to more 
robust packages overall as MPI bugs get fixed against both 
implementations.
What are your thoughts on it? Is there a strong reason why we should
stick with OpenMPI for 64bit releases? Or from a different POV, what
are the risks of changing the implementation? Introducing a different
set of bugs?
One point to consider is that upstream developers of several of our 
numerical libraries have time and again suggested to us that we use 
mpich instead of openmpi, even before this v5 shift. They perceive 
(rightly or wrongly) that mpich is more robust, more reliable.
It would be useful to know whether that changes with v5, or whether 
their complaints are historical and openmpi has already fixed the bugs 
that concerned them. mpich has had its own share of bugs over the 
years. My memory told me RMA support was an issue in openmpi, but when 
I checked my facts, it was mpich that had to be fixed 
(https://github.com/pmodels/mpich/issues/6110)
Drew