[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ROCm 5.4.0 Released



Hi Étienne,

On 2022-12-02 03:49, Étienne Mollier wrote:
Thanks for the notice, at the moment the Debian release freeze
will occur on 2023-01-12, and the ROCm 5.2.3 is in a consistent
state in testing, the upcoming Debian 12 bookworm.  I feel a bit
wary updating to unstable right now, but begun bumping version
to 5.4.0 this morning in experimental.

I forgot one of the most important features of ROCm 5.4. It adds the GFX11 family of processors.

I think that means ROCm 5.4 will probably require clang-16 in the lower parts of the stack. In the upper parts of the stack, like the math and communication libraries, gfx1100 and gfx1102 were added to the default build architectures. So, upgrading some of the lower components might be impossible while using clang-15 and upgrading the upper components might require changes to the build rules. Most of the math and communication libraries from ROCm 5.4 can probably still be built with ROCm 5.2, but not for gfx1100 or gfx1102.

It might be time for Debian to start explicitly choosing its target GPU architectures.

For header-only libraries like rocPRIM, rocThrust, and hipCUB, passing -DAMDGPU_TARGETS only affects the architecture that the tests are built for. It therefore only makes sense to build for hardware that Debian tests against. The libraries are currently building for gfx803, gfx900:xnack-, gfx906:xnack-, gfx908:xnack-, gfx90a:xnack-, gfx90a:xnack+, and gfx1030. Assuming that the test suite binaries are just used locally, you could probably do something fancy like detect the GPU architecture and build for that. There is even a hipcc feature that does that, but sadly it's overridden in CMake with some fixed default values. However, a quick-n-dirty improvement would probably just be to trim that list down to the architectures for the Radeon VII (gfx906:xnack-) and RX 6800 (gfx1030). That should cut the build time to about a quarter of its current length.

For normal GPU libraries like rocRAND, rocSPARSE, rocFFT, rocBLAS, rocSOLVER, etc., I would suggest starting with the upstream targets and adjusting them as desired.

One change worth considering is perhaps replacing gfx90a:xnack- and gfx90a:xnack+ with gfx90a. The xnack feature is associated with automatically paging memory from the CPU to GPU (and vice versa) via the kernel's Heterogeneous Managed Memory (HMM) interface. When you specify xnack- or xnack+ in the target id for the compiler, you're asking it to generate code that's specialized for that particular mode of operation (which may result in better performance). When you omit any mention of xnack, however, it will generate code that can handle either mode of operation. Thus, -DAMDGPU_TARGETS="gfx90a:xnack-;gfx90a:xnack+" is functionally identical to -DAMDGPU_TARGETS=gfx90a, but building two specialized versions of the code increases compile time and binary size in exchange for (possibly) more performance.

Another change to consider would be adding gfx1010 (e.g., Radeon RX 5700 XT) and gfx1011 (e.g., Radeon v520 Pro) to the build target list if the test suite passes. I have access to appropriate hardware for testing, though, for now I will be prioritizing adding new libraries over adding new architectures to existing libraries.

Sincerely,
Cory Bloor


Reply to: