ROCm for AMD GPUs at Ubuntu Summit 2024
Hi folks,
I gave a presentation on the state of AMD GPU compute support in Debian
and Ubuntu at the Ubuntu Summit 2024 back in October. The video is
unlisted, but it's in the public playlist of conference videos so I've
been told I'm allowed to share it. My presentation slides are also
available [2]. It has been three months since I gave the talk, but
relatively little has changed in that time. It remains an accurate
picture of the state of ROCm on Debian today.
One of the most important things I'd like to share is the list of
packages I found that have AMDGPU support upstream, and that could have
support enabled within Debian. These mostly fall under the domain of the
Debian Science Team: adios2, blaspp, cp2k, cupy, dbcsr, ectrans, elpa,
gloo, hpx, hwloc, hypre, kokkos, lammps, lapackpp, llama-cpp, magma,
mfem, mpich, onnxruntime, papi, paraview, petsc, pyfr, pytorch, slepc,
spfft, sundials, superlu-dist, trilinos, and whisper-cpp. In most cases,
everything needed to enable AMD GPU support for these packages already
exist in the Debian repositories. If anyone wants to work on enhancing
these packages, but lacks access to AMD GPU hardware, please reach out
to me as I can help.
Note that the ROCm packages in Debian have extended hardware support. I
go into this in the talk, but the short version is that while AMD
upstream only officially supports a rather limited set of consumer GPUS,
the Debian packages are carrying patches that enable essentially all
discrete AMD GPUs since Vega to work (and some APUs too), with the only
exception being MI300 as we've not yet updated to ROCm 6. To support
this effort, the Debian ROCm Team has an extensive set of continuous
integration servers, with at least one worker covering every discrete
AMD GPU architecture that we claim support for. The only major updates
to hardware support since I made this presentation are that we've done
more testing on APUs. I've learned that gfx902 and gfx1033 are broken,
while gfx90c works quite well.
The talk description:
This session provides an introduction to the AMD ROCm™ software platform
for GPU-accelerated computing on Debian and Ubuntu. It will describe the
current state of AMD GPU support for AI and HPC applications,
demonstrate methods to build and run applications for AMD GPUs, and
clarify some common sources of user confusion. As part of this talk, we
will discuss recent community efforts to package the ROCm compute stack,
as well as differences between the AMD-provided packages and the
distribution-provided packages.
Sincerely,
Cory Bloor
[1]: https://youtu.be/sUnZDYvfceE?si=CIJpRdmfynHwJscm
[2]:
https://docs.google.com/presentation/d/1dY-RJlay0dypgVcKwEKYaB4pNX79FOae_GwwAAMO98M/edit?usp=sharing
Reply to: