[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: h5py and hdf5-mpi



The main difference between h5py build against a serial version of
HDF5 vs the mpi version is that h5py built against mpi HDF5 can use
the mpi-specific tooling (such as collective IO) - you can use a
serial h5py with MPI and there's no problems other than constraints
that MPI imposes on any IO, so the big difference would be writing to
files not reading reading them. I'd suggest switching on the MPI
support and checking back in a year to see if there have been any
major issues (the MPI tooling is currently not tested that well
currently, so there may be bugs), and making a longer term call then.
Last time I looked at the state of h5py and HDF5 across the different
distros, most only packaged one version of HDF5 (either serial or
mpi), and RedHat (and derivs) used their module system (which handled
moving between different backends, and with different installs of
h5py). I'd keep one version of h5py in the archive, and chose a HDF5
to build against, rather than try playing with alternatives or
anything.

James (one of the h5py devs)

On Wed, 14 Aug 2019 at 20:47, Drew Parsons <dparsons@emerall.com> wrote:
>
> On 2019-08-14 18:05, Steffen Möller wrote:
> > On 13.08.19 06:01, Drew Parsons wrote:
> >>
> >> To reiterate, having h5py-mpi available will be transparent to a user
> >> interacting with hdf as a serial library. It doesn't break serial use,
> >> it just provides the capability to also run multicpu jobs.
> >
> >
> > This sounds like an omission not to feature, then. Please go for it.
> >
>
> h5py 2.9.0-3 will migrate to testing in a day or two, we can proceed
> with the mpi then.
>
>
> >>> How do autotests work for MPI?
> >> We simply configure the test script to invoke the same tests using
> >> mpirun.
> >
> > I am somewhat uncertain that Debian needs to be the instance testing
> > this. But given all the hick-ups that are possibly introduced by
> > parallelization - would be good to test it. And Debian should then take
> > some pride in it and announce that.
>
> Once we've got mpi activated in h5py, we can check whether the
> parallelisation does in fact improve your own workflow. Even on a laptop
> or desktop, most come with at least 4 cpus these days. Even mobile
> phones.  Do you deal with GB-size hdf5 datasets, data for which access
> time is noticeable?  Ideally your data handling will speed up according
> to the number of cpus added.
>
> I don't think switching on mpi in h5py is itself such a big deal.  But
> if we can demonstrate that it measurably improves performance for a real
> workflow, then that is worth crowing about.
>
> > Does Debian have any mechanisms to indicate that a software can run in
> > parallel? I am thinking about all the automation that now controls
> > workflows - like toil and/or cwl - or the testing of reverse
> > dependencies on some buildd. These can check for the presence for a
> > binary but don't immediately know if they should start it with mpirun.
>
> No specific mechanism, since normally we known if the program is
> intended to be mpi enabled or not.
>
> But at the level of the package, we can look at dependencies, e.g.
>    apt-cache depends python3-scipy | grep mpi
>    apt-cache depends python3-dolfin | grep mpi
>
> At the level of a given library or executable, objdump can be helpful,
> e.g.
>    objdump -p /usr/lib/x86_64-linux-gnu/libsuperlu.so | grep mpi
>    objdump -p /usr/lib/x86_64-linux-gnu/libsuperlu_dist.so | grep mpi
>
> For autopkgtest, it's our own tests so we already know if the program is
> compiled with mpi or not. It wouldn't really make sense for the scripts
> in debian/tests to check whether the program being tested was compiled
> with mpi.
>
> Drew
>


-- 
Don't send me files in proprietary formats (.doc(x), .xls, .ppt etc.).
It isn't good enough for Tim Berners-Lee, and it isn't good enough for
me either. For more information visit
http://www.gnu.org/philosophy/no-word-attachments.html.

Truly great madness cannot be achieved without significant intelligence.
 - Henrik Tikkanen

If you're not messing with your sanity, you're not having fun.
 - James Tocknell

In theory, there is no difference between theory and practice; In
practice, there is.


Reply to: