[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Test hardware for ROCm on arm64 and ppc64el



Hi Nick,

Welcome! Thanks for looking into this.

On 2025-04-09 18:06, Nick A wrote:
Hi Christian, thanks for the example. I think it was enough to get started
and run the tests. There was a "Bus error" while running
rocrand_basic_tests #10. The stacktrace is in this gist,
https://gist.github.com/nicholasaiello/ee51ba1e4dee921264607fec8693d07d

`dmesg` didn't show any errors or seg faults, either. Not sure if there are
other logs to review. Let me know if there's anything else to try.

That is not unexpected. In speaking with the AMDGPU driver team, they emphasized that many ARM systems don't have sufficient compliance with the PCIe spec. Due to my limited understanding of this subject, I'll quote Alexander Deucher almost verbatim on the two requirements for the AMDGPU driver that he highlighted when I asked about this:

- Platform PCIe needs to support CPU cache snoop by the device. PCI spec requires cache coherence with the CPU. - Platform needs to support unaligned access to MMIO space. That is, PCI BARs mapped to userspace need to support unaligned accesses from user applications just like system memory.

I've seen user reports of Vulkan working for the Raspberry Pi with an AMD GPU with some driver patches [1]. I believe they address the problem you're seeing by intercepting system faults and emulating unaligned accesses. There's also a GitHub issue with discussion of whether the patches could be upstreamed [2]. (Unlikely, at least with the current approach.)

Pepijn de Vos was kind enough to send me a Sentinel Core board for the Raspberry Pi [3]. I must admit, I thought it was a complete system when I asked for the board. It will take me a bit of time to acquire the parts to assemble a working system and begin my testing with various GPUs. Then perhaps I'll be able to reproduce your results myself.

For the moment, Ampere's SystemReady certified workstations and servers appear to be the most likely hardware to work with Debian's ROCm packages for ARM64, though I've not yet managed to get my hands on such a system to run the tests.

Sincerely,
Cory Bloor

[1]: https://github.com/geerlingguy/raspberry-pi-pcie-devices
[2]: https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/222
[3]: https://sanctuary-systems.com/sentinel-core/


Reply to: