Three discussion questions on rocm-target-arch

To: Debian ROCm Team <debian-ai@lists.debian.org>
Cc: Christian Kastner <ckk@debian.org>
Subject: Three discussion questions on rocm-target-arch
From: Cordell Bloor <cgmb@slerp.xyz>
Date: Mon, 29 Sep 2025 02:25:17 -0600
Message-id: <[🔎] 76f4248e-c81d-4832-aac7-1e8c92686c8d@slerp.xyz>

I've been thinking about rocm-target-arch.

(1) I was wondering what happens with rocm-target-arch in downstreamdistributions like Mint, Pop_OS, etc. When Linux Mint 23 rolls around, Iassume that ROCm will fail to build on that platform?

The current Linux Mint 22 codename is Zara, so if we were usingrocm-target-arch at the start of last year, then I think their ROCmpackages would probably see there's no/usr/share/pkg-rocm-tools/data/build-targets/zara data file and errorout. It's totally reasonable to expect downstream packagers to put in abit of work to maintain a distribution, but I worry a bit about breakingROCm packages in every downstream distro by default. There will alwaysbe some portion of downstream maintainers that don't fix them and it'sthe users that will lose out.

I'm also pondering over how we're updating packages to the new targetlist. In our planning for enabling gfx1201 on unstable, we're uploadingthe tooling to build for gfx1201 to unstable, then updatingpkg-rocm-tools to build for gfx1201, then uploading our new packages.

(2) The packages *must* be uploaded in sequence from the bottom of thedependency tree to the top --- even if there were no API or ABI changesin some of the lower-level packages --- because rocm-target-archdefaults to "reduce" mode, which drops any targets that are not found inall build dependencies. This ordering requirement will also apply if weever change the rocm-target-arch list and request binNMUs.

I think the main benefit of reduce mode is that if a package includessome value in the X-ROCm-GPU-Architecture field, then probably alldependencies also include that value. This would mostly only be false ifa dependency dropped support or if package was built against adependency on unstable and migrated to testing before the dependencydid. The biggest cost is that it's more difficult to reason about. Youmust know the current state of all your GPU-enhanced B-Ds on buildd toknow which targets your package will build for after upload. There arealso a number of tricky cases where this behaviour is incorrect and Ithink they may be more common than expected (e.g., B-D uses SPIR-V,generic targets, HIP RTC, or calls to the B-D are guarded by conditionals)..

(3) It's a bit annoying that we can't update rocm-target-arch so that itis ready when new ROCm versions land in unstable. I sort of wonder ifthe distribution is the wrong thing to be checking. The compiler versionmight be an interesting alternative. You could update pkg-rocm-tools onunstable right now, saying which targets would be enabled for LLVM 21,but as long as unstable is using LLVM 17, it would continue using theLLVM 17 targets. That would sidestep problem (1) as well, as it would berobust against changes to the distribution name.

I suppose there's always the possibility that two distributions want touse the same version of LLVM but have different build targets. There arefew possibilities there. They could have different versions ofpkg-rocm-tools. Or, we could have an optional qualifier for thedistribution. We can probably solve this one on the fly if it ever happens.

That was a bit long, so I'll end with a brief summary. I welcome yourthoughts on these questions:

(1) Maybe rocm-target-arch could print a warning and default to sid ifit can't identify the distribution?

(2) We should consider defaulting to --no-reduce. It is more similar towhat we're used to when setting targets manually. It's a very impressivefeature, but it might be a bit too much all at once?

(3) Perhaps we should consider basing the target architectures on thecompiler version, rather than the target distribution?


Sincerely,
Cory Bloor

Reply to:

Follow-Ups:
- Re: Three discussion questions on rocm-target-arch
  - From: Christian Kastner <ckk@debian.org>

Prev by Date: Re: Bug#1116585: amd_hip_bf16.h: bf16 functions need to be static
Next by Date: Re: RFC: /usr/bin/hipcxx for system default HIP compiler
Previous by thread: Re: ROCm on riscv64
Next by thread: Re: Three discussion questions on rocm-target-arch
Index(es):
- Date
- Thread