[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

ROCm for AMD GPUs at Ubuntu Summit 2024



Hi folks,

I gave a presentation on the state of AMD GPU compute support in Debian and Ubuntu at the Ubuntu Summit 2024 back in October. The video is unlisted, but it's in the public playlist of conference videos so I've been told I'm allowed to share it. My presentation slides are also available [2]. It has been three months since I gave the talk, but relatively little has changed in that time. It remains an accurate picture of the state of ROCm on Debian today.

One of the most important things I'd like to share is the list of packages I found that have AMDGPU support upstream, and that could have support enabled within Debian. These mostly fall under the domain of the Debian Science Team: adios2, blaspp, cp2k, cupy, dbcsr, ectrans, elpa, gloo, hpx, hwloc, hypre, kokkos, lammps, lapackpp, llama-cpp, magma, mfem, mpich, onnxruntime, papi, paraview, petsc, pyfr, pytorch, slepc, spfft, sundials, superlu-dist, trilinos, and whisper-cpp. In most cases, everything needed to enable AMD GPU support for these packages already exist in the Debian repositories. If anyone wants to work on enhancing these packages, but lacks access to AMD GPU hardware, please reach out to me as I can help.

Note that the ROCm packages in Debian have extended hardware support. I go into this in the talk, but the short version is that while AMD upstream only officially supports a rather limited set of consumer GPUS, the Debian packages are carrying patches that enable essentially all discrete AMD GPUs since Vega to work (and some APUs too), with the only exception being MI300 as we've not yet updated to ROCm 6. To support this effort, the Debian ROCm Team has an extensive set of continuous integration servers, with at least one worker covering every discrete AMD GPU architecture that we claim support for. The only major updates to hardware support since I made this presentation are that we've done more testing on APUs. I've learned that gfx902 and gfx1033 are broken, while gfx90c works quite well.

The talk description:

This session provides an introduction to the AMD ROCm™ software platform for GPU-accelerated computing on Debian and Ubuntu. It will describe the current state of AMD GPU support for AI and HPC applications, demonstrate methods to build and run applications for AMD GPUs, and clarify some common sources of user confusion. As part of this talk, we will discuss recent community efforts to package the ROCm compute stack, as well as differences between the AMD-provided packages and the distribution-provided packages.

Sincerely,
Cory Bloor

[1]: https://youtu.be/sUnZDYvfceE?si=CIJpRdmfynHwJscm
[2]: https://docs.google.com/presentation/d/1dY-RJlay0dypgVcKwEKYaB4pNX79FOae_GwwAAMO98M/edit?usp=sharing


Reply to: