Re: Joining the debian HPC team
Hi Henning,
Am Sat, Jul 09, 2022 at 10:05:54AM +0200 schrieb Henning Glawe:
> Hi Debian-HPC Team,
> I'd like to join you in order to improve the admin/user experience of Debian
> on HPC systems.
Please note that I do not consider myself a member of the HPC team. I'm
just reading the list and simply try to get rid of the condor[1] package
from NeuroDebian / Debian Med team where it simply does not belong to.
> My background:
> - Debian developer already for quite some time:
> + FAI https://fai-project.org (my main contribution: 'softupdate', i.e.
> configuration management after the initial installation)
> + maintainer of PDL (Perl Data Language) for some years
> - PhD in computational physics (high-throughput screening for new
> superconductors)
> - HPC purchase/setup/administration
> + started with a 2-rack Opteron HPC at Freie Universitaet Berlin,
> Physics department before 2006 (PBS/Torque)
> + several smaller HPC-Clusters at
> * Max-Planck-Institute for Microstructure Physics (Halle/Germany)
> * Max-Planck-Institute for the Structure and Dynamics of Matter
> (Hamburg/Germany)
> - involved in the build/test farm (based on buildbot) of octopus
> (https://octopus-code.org)
Sounds good.
> Ideas:
> - CUDA-aware MPI in Debian, in the context of nvlink/sxm2:
> + package gdrcopy kernel drivers (https://github.com/NVIDIA/gdrcopy)
> + UCX support for gdrcopy, and integration into Debian's OpenMPI packages
> (https://www.open-mpi.org/faq/?category=buildcuda)
> - official Debian packages for https://octopus-code.org
>
> I am a bit confused about the distribution of packages among the debian-hpc,
> debian-science and pkg-nvidia-devel teams:
> - I imagine gdrcopy in the context of pkg-nvidia-devel, i.e. the team that
> maintains also other nvidia cuda kernel drivers
> - why is ucx (a fairly low-level HPC lib) maintained by debian-science,
> while openmpi is maintained by debian-hpc?
There are historic reasons and my gut feeling that the HPC team is not
as functional as it should be. For example when I dealt with condor I
was told[2] that there is a pending upload of the at that time latest
version. This is now at least 9 months ago and the last upload was
several years / releases ago. I would love if this would be cleaned up
and an active HPC team would care for all relevant HPC software in
Debian.
Kind regards
Andreas.
[1] https://tracker.debian.org/pkg/condor
[2] https://lists.debian.org/debian-med/2021/12/msg00177.html
--
http://fam-tille.de
Reply to: