[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Enabling ROCm on Everything



Hi Étienne,

On 3/21/23 14:34, Étienne Mollier wrote:
So perhaps this is a non-problem (at least regarding rocsparse,
but other components may prove to be more difficult if they are
much larger).

The rocfft library is about 4x the size of rocsparse, which is why it is split up into five shared object libraries.

On 3/21/23 14:34, Étienne Mollier wrote:
I'm not sure what to think.  Long term there will be a need
upstream to split the libraries when architectures will add up,
otherwise the model will not scale due to the issues pointed out
by Cory.  Short term the monolithic library is not good, but
fair enough, and splitting would introduce a number of issues
pointed out by Mo Zhou.

I have no idea what upstream will do when they hit the binary size limit. If their packaging infrastructure is sophisticated enough, maybe they'll split their packages too. However, there's always the possibility that they will just start dropping old architectures from their binaries.

By contrast, Debian specializes in packaging and could significantly improve the experience of working with ROCm. Consider two case studies [1][2]. In the first case, the author began struggling with ROCm in part because they wanted to work with gfx1101 (Radeon Pro V520) and gfx1031 (RX 6700 XT) but neither of those architectures are included in upstream binaries despite being fully functional [1]. In the second case, the author began by fighting with the amdgpu-install script and ended with fruitlessly trying to enable gfx90c (which is not included in upstream binaries) [2]. Whereas in the solution I propose, they could just `apt install` their desired -gfx10 or -gfx9 packages.

I strongly believe that with a little help from upstream, some clever packaging in Debian could dramatically improve the average person's experience working with AMD GPU libraries. It would be nice if upstream had a bytecode format that made fancy packaging was unnecessary. Maybe one day that will happen, but I wouldn't hold my breath waiting.

We can fix this situation with the tools that we have here and now. It might not be the simplest packaging, but I think that the expanded hardware support it enables is worth the complexity.

Sincerely,
Cory Bloor

[1]: https://threedots.ovh/blog/2022/05/amd-rocm-a-wasted-opportunity/
[2]: https://scalability.org/state-of-amds-rocm/



Reply to: