My current setup uses patched AMDGPU drivers [1] and I'm able to run Llama.cpp w/ Vulkan, and recently Ollama w/ Vulkan (using a forked repo). Those have been stable and perform well enough, all things considered. It's a bummer that the tests didn't work w/ the same patches. However, the Pi community has been able to work around issues/limitations, before, within reason. That being said, is there a way to print out a full stack trace for these test errors? It may be worth the effort to timebox this error, debug and loop in others, like Coreforge (original patcher) and Jeff, if/when there's something tangible. Obviously, it could lead to nothing, but figuring that out is part of the fun!
Regards,-nick-
PS — Cory, I only just saw your comment about installing pre-built packages or building the patched source [2]! I could try that approach as well, making sure to align the amd target arch w/ my gpu (gfx1100).On Thu, Apr 10, 2025 at 1:44 PM Cordell Bloor <cgmb@slerp.xyz> wrote:Hi Nick,
Welcome! Thanks for looking into this.
On 2025-04-09 18:06, Nick A wrote:
> Hi Christian, thanks for the example. I think it was enough to get started
> and run the tests. There was a "Bus error" while running
> rocrand_basic_tests #10. The stacktrace is in this gist,
> https://gist.github.com/nicholasaiello/ee51ba1e4dee921264607fec8693d07d
>
> `dmesg` didn't show any errors or seg faults, either. Not sure if there are
> other logs to review. Let me know if there's anything else to try.
That is not unexpected. In speaking with the AMDGPU driver team, they
emphasized that many ARM systems don't have sufficient compliance with
the PCIe spec. Due to my limited understanding of this subject, I'll
quote Alexander Deucher almost verbatim on the two requirements for the
AMDGPU driver that he highlighted when I asked about this:
- Platform PCIe needs to support CPU cache snoop by the device. PCI spec
requires cache coherence with the CPU.
- Platform needs to support unaligned access to MMIO space. That is, PCI
BARs mapped to userspace need to support unaligned accesses from user
applications just like system memory.
I've seen user reports of Vulkan working for the Raspberry Pi with an
AMD GPU with some driver patches [1]. I believe they address the problem
you're seeing by intercepting system faults and emulating unaligned
accesses. There's also a GitHub issue with discussion of whether the
patches could be upstreamed [2]. (Unlikely, at least with the current
approach.)
Pepijn de Vos was kind enough to send me a Sentinel Core board for the
Raspberry Pi [3]. I must admit, I thought it was a complete system when
I asked for the board. It will take me a bit of time to acquire the
parts to assemble a working system and begin my testing with various
GPUs. Then perhaps I'll be able to reproduce your results myself.
For the moment, Ampere's SystemReady certified workstations and servers
appear to be the most likely hardware to work with Debian's ROCm
packages for ARM64, though I've not yet managed to get my hands on such
a system to run the tests.
Sincerely,
Cory Bloor
[1]: https://github.com/geerlingguy/raspberry-pi-pcie-devices
[2]: https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/222
[3]: https://sanctuary-systems.com/sentinel-core/