Re: h5py and hdf5-mpi
On 2019-08-13 03:51, Steffen Möller wrote:
Hello,
There are a few data formats in bioinformatics now depending on hdf5 
and
h5py is used a lot. My main concern is that the user should not need to
configure anything, like a set of hostnames. And there should not be
anything stalling since it waiting for contacting a server. MPI needs 
to
be completely transparent and then I would very much like to see it.
MPI is generally good that way.  The programs runs directly as a simple 
serial program if you run it on its own, so in that sense it should be 
transparent to the user (i.e. you won't know its mpi-enabled unless you 
know to look for it).  A multicpu job is launched via running the 
program with mpirun (or mpiexec).
e.g. in the context of python and h5py, if you run
  python3 -c 'import h5py'
then the job runs as a serial job, regardless of whether h5py is built 
for hdf5-serial or hdf5-mpi.
If you want to run on 4 cpus, you launch the same program with
  mpirun -n 4 python3 -c 'import h5py'
Then if h5py is available with hdf5-mpi, it handles hdf5 as a 
multiprocessor job.  If h5py here is built with hdf5-serial, then it 
runs the same serial job 4 times at the same time.
To reiterate, having h5py-mpi available will be transparent to a user 
interacting with hdf as a serial library. It doesn't break serial use, 
it just provides the capability to also run multicpu jobs.
How do autotests work for MPI?
We simply configure the test script to invoke the same tests using 
mpirun.
Drew
Reply to: