Re: h5py and hdf5-mpi

To: debian-science@lists.debian.org
Cc: debian-science@lists.debian.org
Subject: Re: h5py and hdf5-mpi
From: Steffen Möller <steffen_moeller@gmx.de>
Date: Wed, 14 Aug 2019 12:05:05 +0200
Message-id: <[🔎] c4add0b2-0ab7-2db9-5e1b-448adbc7f1df@gmx.de>
In-reply-to: <[🔎] d0836264bc68454b2658129254184dc0@emerall.com>
References: <[🔎] 1faee1eb2da43d988e3da96e136765b2@debian.org> <[🔎] d0608a21afefbca864e7c3325b258dfc@debian.org> <[🔎] CAFzxpWpNHstOwLWVdLYmo445Wi60+J7Uy4WsFOTvnfUYo1hJ0Q@mail.gmail.com> <[🔎] 1d4e9a6d-6506-36df-4206-3091a0a29edb@gmx.de> <[🔎] d0836264bc68454b2658129254184dc0@emerall.com>


On 13.08.19 06:01, Drew Parsons wrote:

On 2019-08-13 03:51, Steffen Möller wrote:

Hello,


There are a few data formats in bioinformatics now depending on hdf5 and
h5py is used a lot. My main concern is that the user should not need to
configure anything, like a set of hostnames. And there should not be
anything stalling since it waiting for contacting a server. MPI needs to
be completely transparent and then I would very much like to see it.


MPI is generally good that way.  The programs runs directly as a
simple serial program if you run it on its own, so in that sense it
should be transparent to the user (i.e. you won't know its mpi-enabled
unless you know to look for it).  A multicpu job is launched via
running the program with mpirun (or mpiexec).

e.g. in the context of python and h5py, if you run
  python3 -c 'import h5py'
then the job runs as a serial job, regardless of whether h5py is built
for hdf5-serial or hdf5-mpi.

If you want to run on 4 cpus, you launch the same program with
  mpirun -n 4 python3 -c 'import h5py'

Then if h5py is available with hdf5-mpi, it handles hdf5 as a
multiprocessor job.  If h5py here is built with hdf5-serial, then it
runs the same serial job 4 times at the same time.

To reiterate, having h5py-mpi available will be transparent to a user
interacting with hdf as a serial library. It doesn't break serial use,
it just provides the capability to also run multicpu jobs.



This sounds like an omission not to feature, then. Please go for it.

How do autotests work for MPI?

We simply configure the test script to invoke the same tests using
mpirun.


I am somewhat uncertain that Debian needs to be the instance testing
this. But given all the hick-ups that are possibly introduced by
parallelization - would be good to test it. And Debian should then take
some pride in it and announce that.

Does Debian have any mechanisms to indicate that a software can run in
parallel? I am thinking about all the automation that now controls
workflows - like toil and/or cwl - or the testing of reverse
dependencies on some buildd. These can check for the presence for a
binary but don't immediately know if they should start it with mpirun.

Best,

Steffen

Reply to:

Follow-Ups:
- Re: h5py and hdf5-mpi
  - From: Drew Parsons <dparsons@emerall.com>

References:
- h5py and hdf5-mpi
  - From: Drew Parsons <dparsons@debian.org>
- Re: h5py and hdf5-mpi
  - From: Mo Zhou <lumin@debian.org>
- Re: h5py and hdf5-mpi
  - From: Ghislain Vaillant <ghisvail@gmail.com>
- Re: h5py and hdf5-mpi
  - From: Steffen Möller <steffen_moeller@gmx.de>
- Re: h5py and hdf5-mpi
  - From: Drew Parsons <dparsons@emerall.com>

Prev by Date: Re: h5py and hdf5-mpi
Next by Date: Testing parallel execution Re: h5py and hdf5-mpi
Previous by thread: Testing parallel execution Re: h5py and hdf5-mpi
Next by thread: Re: h5py and hdf5-mpi
Index(es):
- Date
- Thread