Bug#1068199: librocfft0: callback test failures on gfx900 and gfx1030
On 2024-04-04 09:05, Christian Kastner wrote:
> The issue is already visible with AMD_LOG_LEVEL=1, it's the lack of PCIe
> atomics:
>
>> [ RUN ] rocfft_UnitTest.default_load_callback_complex_single
>> :1:rocvirtual.cpp :2949: 1796815625 us: [pid:1917 tid:0x7f4a2102c980] Pcie atomics not enabled, hostcall not supported
>> :1:rocvirtual.cpp :3289: 1796816120 us: [pid:1917 tid:0x7f4a2102c980] AQL dispatch failed> clients/tests
> In an older ROCm ticket, a workaround to enable PCIe atomics in the
> guest was discussed [1], but I never got this to work. The relevant bit
> is not set after invoking setpci.
In a more recent issue [2], a lack of PCIe atomics was also discovered
on physical hardware (it can depend on the CPU and/or the PCIe slot).
In that issue, it was stated that updating to ROCm 6.0 (and PyTorch)
resolved the issue.
I just rebuilt rocfft to 6.0.2 but the issue is still present. But that
was naive, there are other < 6.0 components in the stack that could
affect this.
> [1] https://github.com/ROCm/ROCK-Kernel-Driver/issues/26#issuecomment-313857180
[2] https://github.com/ROCm/ROCm/issues/2429
Reply to: