[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ROCm GPU_TARGETS and GPU_ARCHS and some other points



Le 19/10/2025 à 13:46, Christian Kastner a écrit :
Hi Christian,

On 2025-10-16 15:10, Christian BAYLE wrote:
I'm currently working on composable-kernel [1] package and have some
questions about GPU_TARGETS and GPU_ARCHS that are used to build the
libraries
NOTE: If you try setting GPU_TARGETS to a list of architectures, the
build will only work if the architectures are similar, e.g.,
gfx908;gfx90a, or gfx1100;gfx1101;gfx11012. Otherwise, if you want to
build the library for a list of different architectures, you should use
the GPU_ARCHS build argument, for example
GPU_ARCHS=gfx908;gfx1030;gfx1100;gfx942.
---

this raise the question on how to take this in account with
rocm-target-arch tools that will give

rocm-target-arch --sep ';'
gfx803;gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102
This raises an important point: rocm-target-arch will probably need to
support alternative list formats.

With LLVM 21, Cory wrote for rocm-hipamd 6.4.3-1~exp2 [3]:
Users are encouraged to instead make use of LLVM generic target ids,
such as gfx9-generic, gfx10-1-generic, and gfx10-3-generic
I believe rocm-target-arch may need one ore more target flags.

and does not build for gfx803;gfx900;gfx906
If, by this, you mean it should skip the build for gfx803;gfx900;gfx906,
then this could be added as a feature to rocm-target-arch.

The utility already supports a ROCM_TARGET_ARCH_FIXED=<list> for an
explicit list, and a mode to reduce architectures.

One could introduce ROCM_TARGET_ARCH_SKIP=<list> for a list of
architectures to skip, from the default list.

which set to support ? GPU_ARCHS  or GPU_TARGETS ?
Third option: support both. Is there a reasonable use case for this?

Indeed the use case comes from composable-kernel, but it seems it's quite a very specific package as it takes hours/days to build.

It has a different build behaviour depending on the env var used

See [1] around line 182

The tests are only built and maybe useable on a subset of GPU_TARGETS

and there is also a default behaviour, when you don't set any var depending on the compiler version which may be of interest to integrate in rocm-target-arch

The build id so long that I wonder if the package shouldn't be split by TARGETS ...

I will push the packaging code very soon as far as I manage to build something

Regards

Christian B.


[1] https://github.com/ROCm/composable_kernel/blob/develop/CMakeLists.txt



Reply to: