Re: ROCm GPU_TARGETS and GPU_ARCHS and some other points

To: debian-ai@lists.debian.org
Subject: Re: ROCm GPU_TARGETS and GPU_ARCHS and some other points
From: Cordell Bloor <cgmb@slerp.xyz>
Date: Sat, 25 Oct 2025 00:19:57 -0600
Message-id: <[🔎] 75f191b0-df81-44d2-83a0-437b6a4fdbbf@slerp.xyz>
In-reply-to: <[🔎] a58a2db2-f15a-4ef5-b4c3-fd547952092a@debian.org>
References: <[🔎] a58a2db2-f15a-4ef5-b4c3-fd547952092a@debian.org>

Hi Christian,

On 2025-10-16 07:10, Christian BAYLE wrote:

I'm currently working on composable-kernel [1] package and have somequestions about GPU_TARGETS and GPU_ARCHS that are used to build thelibraries [....]I could do several build on a per arch base, which has the goodproperty to build tests and examples, but create conflicting per archpackages
On the other hand the build for all arch takes sometime more than40Gb/core than will be difficult to run on autobuilders
which set to support ? GPU_ARCHS  or GPU_TARGETS ?
are there other packages concerned ? and how do you think it would bebest to deal with this ?

I'm afraid I don't have good answers for you. This may be a case wherewe just try to put something that we think makes sense into the teamrepo or into experimental, and rework it based on what we discovertrying to integrate it into other libraries.

CK is a key library, but I know very little about it aside from the factthat it is not going to be easy to build. I also fear that different CKreverse dependencies may be picky about what version of CK they require.This is just something that we're going to have to learn as we starttrying to make use of it.

Other question, would amd-clang improve memory issues
I noticed that debian clang has no support for parallel jobs

-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Failed

I've seen that ubuntu  llvm-toolchain-rocm package [2] builds clang-rocm
Would composable kernel a good test case to test improvments ?

We discussed this offline, but I would like to answer your questionon-list for posterity. You asked, "would amd-clang improve memoryissues?" The answer is no. The `-parallel-jobs=N` flag allows clang torun N child processes in parallel when compiling a translation unit formultiple GPU architectures rather than building the unit for each GPUarchitecture in serial. This flag can be useful, but it actuallyincreases peak memory usage.

On a related note, I recently learned that there is an upstreamalternative to the parallel jobs flag. I'm not sure if LLVM 21 is newenough, but you might be able to use `--new-offload-driver--offload-jobs=N` to achieve a similar effect with upstream clang [3].Sam Liu, an AMD LLVM developer, described it as such:

> About out-of-tree status of -parallel-jobs, currently there is analternative option to it called --offload-jobs=N which is in trunk butonly available for HIP under --new-offload-driver option. Currently--new-offload-driver is experimental but should work for most HIP apps.The plan is to gradually transition to this new driver since iteventually supports interoperability with OpenMP offloading.


Sincerely,
Cory Bloor

[1] https://github.com/ROCm/composable_kernel
[2] https://launchpad.net/~bullwinkle-team/+archive/ubuntu/rocm-devel

https://gitlab.kitware.com/cmake/cmake/-/issues/26997

Reply to:

Follow-Ups:
- Re: ROCm GPU_TARGETS and GPU_ARCHS and some other points
  - From: Christian BAYLE <bayle@debian.org>

References:
- ROCm GPU_TARGETS and GPU_ARCHS and some other points
  - From: Christian BAYLE <bayle@debian.org>

Prev by Date: Re: New hipcc causes FTBFS (Re: Simplifying ROCm compiler upgrades)
Next by Date: Processed: tagging 1045313, tagging 1112959, tagging 1083746, tagging 1048234, tagging 1104222 ...
Previous by thread: Re: ROCm GPU_TARGETS and GPU_ARCHS and some other points
Next by thread: Re: ROCm GPU_TARGETS and GPU_ARCHS and some other points
Index(es):
- Date
- Thread