[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Joining the debian HPC team



Hi Henning,

Am Sat, Jul 09, 2022 at 10:05:54AM +0200 schrieb Henning Glawe:
> Hi Debian-HPC Team,
> I'd like to join you in order to improve the admin/user experience of Debian
> on HPC systems.

Please note that I do not consider myself a member of the HPC team.  I'm
just reading the list and simply try to get rid of the condor[1] package
from NeuroDebian / Debian Med team where it simply does not belong to.
 
> My background:
> - Debian developer already for quite some time:
>   + FAI https://fai-project.org (my main contribution: 'softupdate', i.e.
>     configuration management after the initial installation)
>   + maintainer of PDL (Perl Data Language) for some years
> - PhD in computational physics (high-throughput screening for new
>   superconductors)
> - HPC purchase/setup/administration
>   + started with a 2-rack Opteron HPC at Freie Universitaet Berlin,
>     Physics department before 2006 (PBS/Torque)
>   + several smaller HPC-Clusters at
>     * Max-Planck-Institute for Microstructure Physics (Halle/Germany)
>     * Max-Planck-Institute for the Structure and Dynamics of Matter
>       (Hamburg/Germany)
> - involved in the build/test farm (based on buildbot) of octopus
>   (https://octopus-code.org)

Sounds good.
 
> Ideas:
> - CUDA-aware MPI in Debian, in the context of nvlink/sxm2:
>   + package gdrcopy kernel drivers (https://github.com/NVIDIA/gdrcopy)
>   + UCX support for gdrcopy, and integration into Debian's OpenMPI packages
>     (https://www.open-mpi.org/faq/?category=buildcuda)
> - official Debian packages for https://octopus-code.org
> 
> I am a bit confused about the distribution of packages among the debian-hpc, 
> debian-science and pkg-nvidia-devel teams:
> - I imagine gdrcopy in the context of pkg-nvidia-devel, i.e. the team that
>   maintains also other nvidia cuda kernel drivers
> - why is ucx (a fairly low-level HPC lib) maintained by debian-science,
>   while openmpi is maintained by debian-hpc?

There are historic reasons and my gut feeling that the HPC team is not
as functional as it should be.  For example when I dealt with condor I
was told[2] that there is a pending upload of the at that time latest
version.  This is now at least 9 months ago and the last upload was
several years / releases ago.  I would love if this would be cleaned up
and an active HPC team would care for all relevant HPC software in
Debian.

Kind regards

      Andreas. 

[1] https://tracker.debian.org/pkg/condor 
[2] https://lists.debian.org/debian-med/2021/12/msg00177.html

-- 
http://fam-tille.de


Reply to: