[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Enabling ROCm on Everything



On 2023-03-21 06:17, Cordell Bloor wrote:
> Hello everyone,
> 
> In the last round of updates to the ROCm packages on Unstable, I did a
> bunch of testing with an RX 5700 XT (gfx1010) and Radeon Pro v520
> (gfx1011). I found that all Debian packaged libraries passed their full
> test suites (with the exception of an out-of-memory error in one
> rocprim/hipcub test).

Side note: when we set up our test runners, we should probably check for
that, and skip the affected tests.

> However, fourteen instruction sets are still too many to put all in a
> single fat binary. As mentioned, rocsparse will fail to link if built
> with more than eight (and even that is pushing dangerously close to the
> limit). The fourteen instruction sets are gfx803, gfx900, gfx904,
> gfx906, gfx908, gfx90a, gfx940, gfx1010, gfx1011, gfx1013, gfx1030,
> gfx1100, gfx1101, and gfx1102. I don't think there's any reasonable way
> for Debian to resolve this problem besides slicing the packages by
> architecture.
> 
> One possible split would be on the GFX architecture major version. There
> would be binary packages for librocsparse0-gfx8, librocsparse0-gfx9,
> librocsparse0-gfx10, and librocsparse0-gfx11 with each providing
> librocsparse0. The GFX9 grouping would be pretty large with six
> architectures, but that's still within acceptable limits. If need be, it
> could be split into gfx9-gcn (gfx900, gfx904, gfx906) and gfx9-cnda
> (gfx908, gfx90a, gfx940).
> 
> So, that's my proposal for enabling the ROCm libraries to run on all
> modern AMD GPUs. I'm not sure how to structure a Debian package to do
> this, but I hope that somebody finds the result to be an enticing enough
> idea to provide some guidance. I imagine that we could build the library
> multiple times, passing a different set of of values for
> -DAMDGPU_TARGETS to cmake during configuration.

I agree that a split of some sort will probably be necessary in the
short-to-midterm, and your proposed solutions looks reasonable to me.

One difficulty we will need to figure out one way or another is how to
actually bring the user to the right package. What do we do when the
user wants to `apt install pytorch-rocm`?

Another difficulty we might need to consider is: what if the system in
question contains multiple GPU architectures (e.g. 6800 XT and 7900 XT)?

The above aren't arguments against, I'm rather just starting to collect
TODOs.

Best,
Christian


Reply to: