[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Navi 12 on Debian (was: August ROCm Package Testing Results)



I asked for some help and I now understand the Navi 12 situation.

On 2023-08-26 00:22, Cordell Bloor wrote:
A notable missing entry in my test strategy is gfx1011, which was not tested because installing Navi 12 hardware seems to cause the amdgpu driver to crash on startup. I know that Navi 12 hardware can work with ROCm, because I've used it on AWS g4ad instances running Ubuntu. I'll have to dig into this more.

The problem appears to be resuming from the BACO (Bus Active, Chip Off) power saving mode. I had no issues after adding amdgpu.runpm=0 to my kernel parameters. I've seen this issue with two different Navi 12 GPUs, so I assume it is a driver problem and not bad hardware. I'll file a bug report when I get a chance.

I updated the supported GPU list [1] and my logs [2] with the gfx1011 results. I saw no problems, aside from a rocSPARSE test case that slightly exceeded the specified tolerance. That test failure is seen on gfx1030, too. We can probably just increase the tolerance slightly.

In conclusion, while there is a serious driver bug affecting Navi 12 hardware, there is a straightforward workaround for the problem. The ROCm math libraries all work as expected on gfx1011.

Sincerely,
Cory Bloor

[1]: https://salsa.debian.org/rocm-team/community/team-project/-/wikis/Supported-GPU-list
[2]: https://slerp.xyz/rocm/logs/full/


Reply to: