Re: Test hardware for ROCm on arm64 and ppc64el
I have news about ROCm on arm64 and ppc64el.
On 2024-12-17 11:55, Cordell Bloor wrote:
The PowerPC driver was enabled and tested by Raptor Computing [1].
Their Talos II systems seem to be getting a bit long in the tooth, but
it would still be nice to run the tests on a platform where we know
the driver was working. I'd like to try to find access to a Raptor
Talos II system to try this out.
Raptor Computing has generously offered to host a PPC64 server with a
Radeon GPU for the Debian ROCm Team. Unfortunately, the initial tests on
an RDNA 3 GPU did not pass. There seem to be driver issues of some kind.
If there are developers from Debian (or AMD) interested in hacking on
the driver, Raptor can provide remote access (with a BMC interface to
handle power cycling).
I'm sad to say that the problems encountered are outside of my area of
expertise. I'm not really a driver guy.
Testing on ARM64 would require a system with PCIe slots. Gigabyte
sells GPU servers with processors from Ampere Computing that may be
suitable for this purpose. The Gigabyte G242-P33 [2] appears to be a
reasonable choice, as it is certified for use with NVIDIA GPUs.
I reached out to the Level1Techs community to ask about hardware [1]. A
few people offered to test on ARM64. Today I heard back from Orr Mazor
("oma"), who apparently had success with an MI210 on an ARM Neoverse
using a Debian Testing docker container. I'm hoping to get more
information about his setup.
Sincerely,
Cory Bloor
[1]: https://forum.level1techs.com/t/testing-rocm-on-arm64-and-ppc64/222601
Reply to: