Re: Production Debian as HPC OS - seeking knowledge exchange
Moin Matthew,
On Fri, Jul 07, 2023 at 06:37:17PM +0000, Smith, Matthew wrote:
> I’m spamming y’all to see if there is anyone on this list running Debian as the
> OS for an in-production academic HPC cluster.
We are running Debian-Bullseye based HPC cluster here at MPSD [1].
> I’m part of a group that some of
> you might be well aware of, the CaRCC people network, that puts together
> monthly talks and discussions that take place monthly. Given the recent
> happenings, we’re looking to secure this topic for our September Systems-facing
> call, and I’m curious if there is anyone on this list who is in this position
> and would be comfortable speaking to it. We’d be looking for an overview of the
> cluster setup, configuration, and operational tasks.
I was not aware of CaRCC, and am not sure if and when I could give a talk on
our setup.
Summarizing our local HPC HW (quite heterogenous):
- /home via nfs4 (FC SAN via Proxmox guest)
- /scratch via cephfs (spinning-discs HP Apollo)
- login nodes as Proxmox guests
- compute nodes:
- many old 16-core nodes with 64G RAM
- some newer 4-socket nodes with 2T RAM
- some nVidia-V100 GPU nodes
- a few power8-nodes
- 10GbE / Infiniband FDR interconnects
Software/OS setup:
- Install/Config management via FAI [2], config in local git
- Debian Bullseye with some official and local backports
- Micro-architecture-optimized HPC tool chains via SPACK [3] and easybuild
(legacy)
User services:
- generic SLURM (job scheduler)
- Buildbot-workers for HPC TDDFT code "Octopus" [4]
[1] https://mpsd.mpg.de
[2] https://fai-project.org/
[3] https://spack.readthedocs.io/en/latest/
[4] https://octopus-code.org/
--
Mit freundlichen Grüßen
Henning Glawe
Dr. Henning Glawe
Max-Planck-Institut für Struktur und Dynamik der Materie
Geb. 99 (CFEL), Luruper Chaussee 149, 22761 Hamburg, Germany
http://www.mpsd.mpg.de/, Email: henning.glawe@mpsd.mpg.de
Building/Room: 99/O2.100, Phone: +49-40-8998-88392
Reply to: