Re: ROCm 5.4.0 Released

To: Étienne Mollier <emollier@emlwks999.eu>
Cc: debian-ai <debian-ai@lists.debian.org>
Subject: Re: ROCm 5.4.0 Released
From: Cordell Bloor <cgmb-deb@slerp.xyz>
Date: Mon, 12 Dec 2022 03:01:00 -0700
Message-id: <[🔎] bd8fafe2-5ef5-9e72-d332-7fbec32a6434@slerp.xyz>
In-reply-to: <[🔎] Y4nYNoJWjVZHjBX2@fusion>
References: <[🔎] ab8655b4-2d93-f5f5-e483-92f92e1ee2d1@slerp.xyz> <[🔎] Y4nYNoJWjVZHjBX2@fusion>

Hi Étienne,

On 2022-12-02 03:49, Étienne Mollier wrote:

Thanks for the notice, at the moment the Debian release freeze
will occur on 2023-01-12, and the ROCm 5.2.3 is in a consistent
state in testing, the upcoming Debian 12 bookworm.  I feel a bit
wary updating to unstable right now, but begun bumping version
to 5.4.0 this morning in experimental.

I forgot one of the most important features of ROCm 5.4. It adds theGFX11 family of processors.

I think that means ROCm 5.4 will probably require clang-16 in the lowerparts of the stack. In the upper parts of the stack, like the math andcommunication libraries, gfx1100 and gfx1102 were added to the defaultbuild architectures. So, upgrading some of the lower components might beimpossible while using clang-15 and upgrading the upper components mightrequire changes to the build rules. Most of the math and communicationlibraries from ROCm 5.4 can probably still be built with ROCm 5.2, butnot for gfx1100 or gfx1102.

It might be time for Debian to start explicitly choosing its target GPUarchitectures.

For header-only libraries like rocPRIM, rocThrust, and hipCUB, passing-DAMDGPU_TARGETS only affects the architecture that the tests are builtfor. It therefore only makes sense to build for hardware that Debiantests against. The libraries are currently building for gfx803,gfx900:xnack-, gfx906:xnack-, gfx908:xnack-, gfx90a:xnack-,gfx90a:xnack+, and gfx1030. Assuming that the test suite binaries arejust used locally, you could probably do something fancy like detect theGPU architecture and build for that. There is even a hipcc feature thatdoes that, but sadly it's overridden in CMake with some fixed defaultvalues. However, a quick-n-dirty improvement would probably just be totrim that list down to the architectures for the Radeon VII(gfx906:xnack-) and RX 6800 (gfx1030). That should cut the build time toabout a quarter of its current length.

For normal GPU libraries like rocRAND, rocSPARSE, rocFFT, rocBLAS,rocSOLVER, etc., I would suggest starting with the upstream targets andadjusting them as desired.

One change worth considering is perhaps replacing gfx90a:xnack- andgfx90a:xnack+ with gfx90a. The xnack feature is associated withautomatically paging memory from the CPU to GPU (and vice versa) via thekernel's Heterogeneous Managed Memory (HMM) interface. When you specifyxnack- or xnack+ in the target id for the compiler, you're asking it togenerate code that's specialized for that particular mode of operation(which may result in better performance). When you omit any mention ofxnack, however, it will generate code that can handle either mode ofoperation. Thus, -DAMDGPU_TARGETS="gfx90a:xnack-;gfx90a:xnack+" isfunctionally identical to -DAMDGPU_TARGETS=gfx90a, but building twospecialized versions of the code increases compile time and binary sizein exchange for (possibly) more performance.

Another change to consider would be adding gfx1010 (e.g., Radeon RX 5700XT) and gfx1011 (e.g., Radeon v520 Pro) to the build target list if thetest suite passes. I have access to appropriate hardware for testing,though, for now I will be prioritizing adding new libraries over addingnew architectures to existing libraries.


Sincerely,
Cory Bloor

Reply to:

Follow-Ups:
- Selecting appropriate GPU archs for testing (Was: ROCm 5.4.0 Released)
  - From: Cordell Bloor <cgmb-deb@slerp.xyz>

References:
- ROCm 5.4.0 Released
  - From: Cordell Bloor <cgmb-deb@slerp.xyz>
- Re: ROCm 5.4.0 Released
  - From: Étienne Mollier <emollier@emlwks999.eu>

Prev by Date: Bug#1022797: librocm-smi-dev: find_package(rocm_smi) fails due to missing liboam
Next by Date: Processing of xgboost_1.7.2-1_source.changes
Previous by thread: Re: ROCm 5.4.0 Released
Next by thread: Selecting appropriate GPU archs for testing (Was: ROCm 5.4.0 Released)
Index(es):
- Date
- Thread