"M. Zhou" <lumin@debian.org> writes: > On Mon, 2023-03-20 at 23:17 -0600, Cordell Bloor wrote: >> Hello everyone, >> >> In the last round of updates to the ROCm packages on Unstable, I did a >> bunch of testing with an RX 5700 XT (gfx1010) and Radeon Pro v520 >> (gfx1011). I found that all Debian packaged libraries passed their full >> test suites (with the exception of an out-of-memory error in one >> rocprim/hipcub test). So, now the rocRAND, hipRAND, rocPRIM, hipCUB, >> rocSPARSE and hipSPARSE packages are enabled for gfx803, gfx900, gfx906, >> gfx908, gfx90a, gfx1010, gfx1011 and gfx1030. >> >> However, there is a cost to this. The rocsparse library is ~250 MiB, but >> we are now building it for eight different GPU instruction sets. That is >> why the library binary is now 1.96 GiB. There are a total of twenty six >> instruction sets in the GFX9, GFX10 and GFX11 families. If you add >> gfx803, that makes twenty seven architectures. If we were to enable >> support all modern AMD GPUs [1], the total size of librocsparse.so would >> be 0.25 GiB * 27 = 6.75 GiB [2]. For better or for worse, that does not >> seem to actually be possible anyway. Once the size of the shared library >> exceeds 2 GiB, it will become too large to use 32-bit relative offsets >> and the library will fail to link. > > There is exactly the same issue for pytorch-cuda. The upstream distributed > binaries put all supported cuda architectrures into a single fat binary, > and will cause linker error (file too large). The have some workarounds > like splitting the shared object into multiple ones, but the overall > binary size is still growing. > > However, as long as the cuda compute architectures are backward-compatible, > we can just build several selected architectures that will work in most cases. > For instance, the upstream has built their binary release of pytorch-cuda > for the following cuda architectures: > 37, 50, 60, 61, 70, 75, 80, 86, 90 > But I suppose 61, 75, and 86 will be sufficient for the debian build of > pytorch-cuda. These correspond to the GTX 1XXX, RTX 2XXX, and > RTX 3XXX series of GPUs. The users of datacenter GPUs are not likely > to use the debian packaged pytorch-cuda. In most cases they will > stick to anaconda. Even if the user has a datacenter GPU, the > code still runs thanks to backward compatibility. But then again, Debian is the Universal operating system. I think history has shown time and time again that it's best if we don't try to guess where, how and in what situations the user will run Debian. Of course, I do appreciate the technical problem here – just sharing a thought :-) -- Gard
Attachment:
signature.asc
Description: PGP signature