Enabling ROCm on Everything

To: debian-ai <debian-ai@lists.debian.org>
Subject: Enabling ROCm on Everything
From: Cordell Bloor <cgmb-deb@slerp.xyz>
Date: Mon, 20 Mar 2023 23:17:38 -0600
Message-id: <[🔎] 099ceb80-9d22-6a82-d0b7-723fb69889af@slerp.xyz>

Hello everyone,

In the last round of updates to the ROCm packages on Unstable, I did abunch of testing with an RX 5700 XT (gfx1010) and Radeon Pro v520(gfx1011). I found that all Debian packaged libraries passed their fulltest suites (with the exception of an out-of-memory error in onerocprim/hipcub test). So, now the rocRAND, hipRAND, rocPRIM, hipCUB,rocSPARSE and hipSPARSE packages are enabled for gfx803, gfx900, gfx906,gfx908, gfx90a, gfx1010, gfx1011 and gfx1030.

However, there is a cost to this. The rocsparse library is ~250 MiB, butwe are now building it for eight different GPU instruction sets. That iswhy the library binary is now 1.96 GiB. There are a total of twenty sixinstruction sets in the GFX9, GFX10 and GFX11 families. If you addgfx803, that makes twenty seven architectures. If we were to enablesupport all modern AMD GPUs [1], the total size of librocsparse.so wouldbe 0.25 GiB * 27 = 6.75 GiB [2]. For better or for worse, that does notseem to actually be possible anyway. Once the size of the shared libraryexceeds 2 GiB, it will become too large to use 32-bit relative offsetsand the library will fail to link.

There are some improvements coming for this situation that are targetedfor LLVM 17. Of the twenty six instruction sets in GFX9, GFX10 andGFX11, only maybe thirteen of them are distinct. There was a period ofseveral years in which each new GPU was given its own unique instructionset id. Many of the instruction sets are identical to each other, andthey will be consolidated where possible. Incidentally, the fact thatsome of these ISAs are identical is why using theHSA_OVERRIDE_GFX_VERSION environment variable can be used to safelyenable ROCm on some unsupported hardware. It is expected that theupstream changes to consolidate ISAs will basically achieve the samething as the environment variable method, but without requiring userintervention.

However, fourteen instruction sets are still too many to put all in asingle fat binary. As mentioned, rocsparse will fail to link if builtwith more than eight (and even that is pushing dangerously close to thelimit). The fourteen instruction sets are gfx803, gfx900, gfx904,gfx906, gfx908, gfx90a, gfx940, gfx1010, gfx1011, gfx1013, gfx1030,gfx1100, gfx1101, and gfx1102. I don't think there's any reasonable wayfor Debian to resolve this problem besides slicing the packages byarchitecture.

One possible split would be on the GFX architecture major version. Therewould be binary packages for librocsparse0-gfx8, librocsparse0-gfx9,librocsparse0-gfx10, and librocsparse0-gfx11 with each providinglibrocsparse0. The GFX9 grouping would be pretty large with sixarchitectures, but that's still within acceptable limits. If need be, itcould be split into gfx9-gcn (gfx900, gfx904, gfx906) and gfx9-cnda(gfx908, gfx90a, gfx940).

So, that's my proposal for enabling the ROCm libraries to run on allmodern AMD GPUs. I'm not sure how to structure a Debian package to dothis, but I hope that somebody finds the result to be an enticing enoughidea to provide some guidance. I imagine that we could build the librarymultiple times, passing a different set of of values for-DAMDGPU_TARGETS to cmake during configuration. I know that splittingthe libraries by architecture is not a popular solution, but I don't seeany other option that enables broad hardware support. To me, the mereexistence of a feasible pathway to broad hardware enablement is exciting.


Sincerely,
Cory Bloor

[1]: Every AMD GPU from Polaris to RDNA3 and CDNA3.

[2]: It's also worth noting that the rocSPARSE library is notparticularly large. The rocBLAS and rocFFT libraries are both largerthan rocSPARSE.

Reply to:

Follow-Ups:
- Re: Enabling ROCm on Everything
  - From: Christian Kastner <ckk@debian.org>
- Re: Enabling ROCm on Everything
  - From: "M. Zhou" <lumin@debian.org>
- Re: Enabling ROCm on Everything
  - From: "M. Zhou" <lumin@debian.org>

Prev by Date: Re: pytorch and CUDA
Next by Date: Re: Enabling ROCm on Everything
Previous by thread: pytorch_1.13.1+dfsg-4_source.changes ACCEPTED into unstable
Next by thread: Re: Enabling ROCm on Everything
Index(es):
- Date
- Thread