Re: Breaking down barriers to ROCm packaging

To: Kari Pahula <kaol@debian.org>
Cc: Petter Reinholdtsen <pere@hungry.com>, debian-ai <debian-ai@lists.debian.org>
Subject: Re: Breaking down barriers to ROCm packaging
From: Cordell Bloor <cgmb@slerp.xyz>
Date: Tue, 30 Jan 2024 23:34:00 -0700
Message-id: <[🔎] 93b7b1f5-2ed5-cb4e-f48c-cffde66635d8@slerp.xyz>
In-reply-to: <[🔎] Zbj2g6dan4CjCYaj@sammakko3.piperka.net>
References: <[🔎] 9fe67886-8264-d613-f84f-8a49fe11fdee@slerp.xyz> <[🔎] sa68r48fv8k.fsf@hjemme.reinholdtsen.name> <[🔎] c30c3e35-baa8-418d-6469-0e824333a713@slerp.xyz> <[🔎] Zbj2g6dan4CjCYaj@sammakko3.piperka.net>

On Mon, Jan 29, 2024 at 01:53:18PM -0700, Cordell Bloor wrote:

[Navi 31] is only officially supported by AMD on Windows, but it
should work fine on Linux.

I was wrong. It is officially supported on Linux [1].

On 2024-01-30 06:15, Kari Pahula wrote:

On Mon, Jan 29, 2024 at 01:53:18PM -0700, Cordell Bloor wrote:

2. Navi 31 (gfx1100) is a well-supported AMD GPU architecture. It is only
officially supported by AMD on Windows, but it should work fine on Linux.
The version of rocm-hipamd packaged for Debian is too old to support Navi
31, but I plan to update to a new upstream release within the next month.
This is the architecture of the RX 7900 XTX, RX 7900 XT, Radeon PRO W7900,
and Radeon PRO W7800.

Would Navi 32 work (ie. Radeon PRO W7700)?  I know the same things
would apply as for Navi 31 but are there any further hurdles to be
expected with it as well?  I think I want to limit the wattage of what
I put in my desktop computer.

There are more hurdles. Navi 32 does not officially have support in ROCm upstream. My understanding is that all ROCm libraries will work on Navi 32 anyway. Unfortunately, it seems that PyTorch does not [2]. The three architectures I recommended are the only AMD GPU architectures found in consumer cards with official support in ROCm 5.7 on Linux.

Of course, it's possible to get things working even if they are not officially supported. It's just a question of whether it's worth taking on that challenge when you're just getting started.

I also happen to have a Fury card which may be kind of supported but
I'm not going to seriously try this with Fiji.

I would agree. There are many known bugs in the ROCm libraries on Fiji.

As a personal matter, I'm more comfortable with just buying whatever's
needed myself.  Unless it needs to be top of the line then I guess I
can reconsider.  I have an interest in this which is why I'm on this
list and if you're asking people to join in then I'm ready to take the
dip.  Access to hardware wasn't as such what kept me away.  Though I
don't really have any experience with GPU programming so pointers to
what needs attention are welcome.

That's great! If you prefer to buy the hardware yourself, I certainly have no complaints. The offer remains open if you ever change your mind.

I've arranged server hosting with a couple universities, so just a reminder that remote access will be an option as well. If you would like more details, we can talk privately.

In terms of pointers as to what needs attention, I created a "Help needed" wiki page [3].

I guess I'd be upgrading my desktop computer too as I don't think an
FX-8350 would cut it for this anymore.  Anything on AM5 should be
fine, right?

I don't think the FX-8350 meets the minimum system requirements. If it is PCIe 2.0, you might need an upgrade. My understanding is that ROCm requires at least PCIe 3.0 [4]:

The ROCm Platform uses the new PCI Express 3.0 (PCIe 3.0) features for Atomic Read-Modify-Write Transactions which extends inter-processor synchronization mechanisms to IO to support the defined set of HSA capabilities needed for queuing and signaling memory operations.

I was using a Ryzen 7700X for my test bench and it was great. Anything on AM5 should be more than sufficient.

Sincerely,
Cory Bloor

[1]: https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.0.0/reference/system-requirements.html#supported-gpus
[2]: https://github.com/pytorch/pytorch/issues/115725#issuecomment-1904925826
[3]: https://salsa.debian.org/rocm-team/community/team-project/-/wikis/Help-needed
[4]: https://rocm.docs.amd.com/en/docs-5.7.1/understand/More-about-how-ROCm-uses-PCIe-Atomics.html

Reply to:

References:
- Breaking down barriers to ROCm packaging
  - From: Cordell Bloor <cgmb@slerp.xyz>
- Re: Breaking down barriers to ROCm packaging
  - From: Petter Reinholdtsen <pere@hungry.com>
- Re: Breaking down barriers to ROCm packaging
  - From: Cordell Bloor <cgmb@slerp.xyz>
- Re: Breaking down barriers to ROCm packaging
  - From: Kari Pahula <kaol@debian.org>

Prev by Date: Re: Breaking down barriers to ROCm packaging
Previous by thread: Re: Breaking down barriers to ROCm packaging
Next by thread: Re: Breaking down barriers to ROCm packaging
Index(es):
- Date
- Thread