[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Enabling ROCm on Everything



"M. Zhou" <lumin@debian.org> writes:

> On Mon, 2023-03-20 at 23:17 -0600, Cordell Bloor wrote:
>> Hello everyone,
>> 
>> In the last round of updates to the ROCm packages on Unstable, I did a 
>> bunch of testing with an RX 5700 XT (gfx1010) and Radeon Pro v520 
>> (gfx1011). I found that all Debian packaged libraries passed their full 
>> test suites (with the exception of an out-of-memory error in one 
>> rocprim/hipcub test). So, now the rocRAND, hipRAND, rocPRIM, hipCUB, 
>> rocSPARSE and hipSPARSE packages are enabled for gfx803, gfx900, gfx906, 
>> gfx908, gfx90a, gfx1010, gfx1011 and gfx1030.
>> 
>> However, there is a cost to this. The rocsparse library is ~250 MiB, but 
>> we are now building it for eight different GPU instruction sets. That is 
>> why the library binary is now 1.96 GiB. There are a total of twenty six 
>> instruction sets in the GFX9, GFX10 and GFX11 families. If you add 
>> gfx803, that makes twenty seven architectures. If we were to enable 
>> support all modern AMD GPUs [1], the total size of librocsparse.so would 
>> be 0.25 GiB * 27 = 6.75 GiB [2]. For better or for worse, that does not 
>> seem to actually be possible anyway. Once the size of the shared library 
>> exceeds 2 GiB, it will become too large to use 32-bit relative offsets 
>> and the library will fail to link.
>
> There is exactly the same issue for pytorch-cuda. The upstream distributed
> binaries put all supported cuda architectrures into a single fat binary,
> and will cause linker error (file too large). The have some workarounds
> like splitting the shared object into multiple ones, but the overall
> binary size is still growing.
>
> However, as long as the cuda compute architectures are backward-compatible,
> we can just build several selected architectures that will work in most cases.
> For instance, the upstream has built their binary release of pytorch-cuda
> for the following cuda architectures:
>   37, 50, 60, 61, 70, 75, 80, 86, 90
> But I suppose 61, 75, and 86 will be sufficient for the debian build of
> pytorch-cuda. These correspond to the GTX 1XXX, RTX 2XXX, and
> RTX 3XXX series of GPUs. The users of datacenter GPUs are not likely
> to use the debian packaged pytorch-cuda. In most cases they will
> stick to anaconda. Even if the user has a datacenter GPU, the
> code still runs thanks to backward compatibility.

But then again, Debian is the Universal operating system. I think
history has shown time and time again that it's best if we don't try to
guess where, how and in what situations the user will run Debian. Of
course, I do appreciate the technical problem here – just sharing a
thought :-)


 -- Gard
 

Attachment: signature.asc
Description: PGP signature


Reply to: