[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1068199: librocfft0: callback test failures on gfx900 and gfx1030



On 2024-04-04 09:05, Christian Kastner wrote:
> The issue is already visible with AMD_LOG_LEVEL=1, it's the lack of PCIe
> atomics:
> 
>> [ RUN      ] rocfft_UnitTest.default_load_callback_complex_single
>> :1:rocvirtual.cpp           :2949: 1796815625 us: [pid:1917  tid:0x7f4a2102c980] Pcie atomics not enabled, hostcall not supported
>> :1:rocvirtual.cpp           :3289: 1796816120 us: [pid:1917  tid:0x7f4a2102c980] AQL dispatch failed> clients/tests

> In an older ROCm ticket, a workaround to enable PCIe atomics in the
> guest was discussed [1], but I never got this to work. The relevant bit
> is not set after invoking setpci.

In a more recent issue [2], a lack of PCIe atomics was also discovered
on physical hardware (it can depend on the CPU and/or the PCIe slot).

In that issue, it was stated that updating to ROCm 6.0 (and PyTorch)
resolved the issue.

I just rebuilt rocfft to 6.0.2 but the issue is still present. But that
was naive, there are other < 6.0 components in the stack that could
affect this.

> [1] https://github.com/ROCm/ROCK-Kernel-Driver/issues/26#issuecomment-313857180

[2] https://github.com/ROCm/ROCm/issues/2429


Reply to: