[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

August ROCm Package Testing Results



Howdy folks,

I did another run of manual tests on the ROCm packages and I have updated the Debian Supported GPU List [1] using the results. It is, perhaps, slightly out-of-date already since I was using the previous version of rocrand. Nevertheless, I doubt much changed. The full set of logs from both my July and August tests can be found on my website [2].

I really encourage you to take a look at the supported GPU list on the wiki. The state of ROCm on Debian is truly special. The only place we fall behind in hardware support is in RDNA 3, but I think we can save that for after pytorch-rocm. With both LLVM 17 and ROCm 5.7 just around the corner, a month or two from now will be pretty good time for a HIP update.

However, there does seem to be an issue with the use of gfx1030 as a fallback architecture for gfx1031 and gfx1032 in rocSPARSE. I think that may have been a problem introduced by the update to rocPRIM 5.5.1, but I haven't had a chance to dig into it yet. In any case, it is a serious problem but it only affects prune_dense2csr_by_percentage.conversion and I suspect that the classic export HSA_OVERRIDE_GFX_VERSION=10.3.0 trick is an effective workaround for this issue. The problem probably is caused by rocPRIM code that is not expecting the compile-time architecture target name to differ from the reported run-time name of the architecture.

A notable missing entry in my test strategy is gfx1011, which was not tested because installing Navi 12 hardware seems to cause the amdgpu driver to crash on startup. I know that Navi 12 hardware can work with ROCm, because I've used it on AWS g4ad instances running Ubuntu. I'll have to dig into this more.

Sincerely,
Cory Bloor

[1]: https://salsa.debian.org/rocm-team/community/team-project/-/wikis/Supported-GPU-list
[2]: https://slerp.xyz/rocm/logs/


Reply to: