Dropping gfx803 support (Was: First draft of ROCm 6.1 Release Plan)

To: debian-ai@lists.debian.org
Subject: Dropping gfx803 support (Was: First draft of ROCm 6.1 Release Plan)
From: Cordell Bloor <cgmb@slerp.xyz>
Date: Mon, 9 Sep 2024 12:16:14 -0600
Message-id: <[🔎] 61f8aaa4-2c62-406b-987e-f94fefd88828@slerp.xyz>
In-reply-to: <a346e1b9-6029-4afa-afb9-a3b5550ae5c1@debian.org>
References: <a346e1b9-6029-4afa-afb9-a3b5550ae5c1@debian.org>

Hi folks,

It looks like we will lose the last vestiges of upstream gfx803 support in the update to ROCm 6 [1]:

On 2024-09-07 06:57, is_a_goat wrote:

ROCm 5.7.1 is the last working version for polaris (I'm stuck on it with an RX580), though compilation needs various `-DAMDGPU_TARGETS=gfx803` flags. PyTorch 2.2 works, building with `export USE_ROCM=1` and `-DAMDGPU_TARGETS=gfx803`. (Newer PyTorch needs to address https://github.com/pytorch/pytorch/issues/119081 before it'll work again.)

If you try ROCm 6, run clinfo with export AMD_LOG_LEVEL=1, you'll probably see:

Unsupported HSA device gfx803 (PCI ID 67df) for ISA amdgcn-amd-amdhsa--gfx803
Error creating new instance of Device.

Best I can tell from poking around the code diffs, it's not easy to just make it work regardless. The code for ROC_ENABLE_PRE_VEGA is gone, and even if you sneak it back in there, clinfo works but any actual opencl program fails.

I have a pile of previously supported gfx803 GPUs, including the RX570, WX 7100, S9300 X2, MI6 and MI8. I'm willing to use them to run whatever tests folks want. However, the reality is that ROCm has been kinda broken on this hardware ever since AMD dropped official support back in 2020. When I test older versions of rocBLAS, I find out-of-bounds writes showing up in its test suite since ROCm 3.7 (the first version after official support was dropped).

It's unfortunate, as there is still a lot of gfx803 hardware out in the wild. The reality is, though, that neither AMD nor the broader community have taken the steps needed to keep this architecture alive. As the support for this hardware is broken and nobody has stepped up to fix it, I think we should drop gfx803 from the library builds when we upgrade to ROCm 6.1.

Sincerely,
Cory Bloor

[1]: https://www.reddit.com/r/ROCm/comments/1fbck0z/comment/lm1j2n6/

Reply to:

Follow-Ups:
- Re: Dropping gfx803 support (Was: First draft of ROCm 6.1 Release Plan)
  - From: Cordell Bloor <cgmb@slerp.xyz>
- Re: Dropping gfx803 support (Was: First draft of ROCm 6.1 Release Plan)
  - From: Christian Kastner <ckk@debian.org>

Prev by Date: Bug#1081225: gdb: enable AMD GPU support
Next by Date: Re: Dropping gfx803 support (Was: First draft of ROCm 6.1 Release Plan)
Previous by thread: Re: Bug#1081225: gdb: enable AMD GPU support
Next by thread: Re: Dropping gfx803 support (Was: First draft of ROCm 6.1 Release Plan)
Index(es):
- Date
- Thread