Re: RFC: Adding an X-ROCm-Built-For field to packages targeting ISAs
Hi Cory,
On 2025-06-28 19:23, Cordell Bloor wrote:
> There is perhaps an additional complexity with the generic targets that
> you may wish to consider in your design.
just to be clear: technically, it's not "my" design, I'm just proposing
one to solve a particular problem.
I emphasize this distinction because I'd (1) really just like to see the
problem addressed and I'd like the (2) best solution for it, even if
it's entirely different from my proposal. Hence the RFC.
> The new generic targets have a hidden version number. When you specify
> that you wish to build for gfx11-generic, the compiler turns that into a
> command to build for gfx11-generic-v0. If we were to imagine that there
> were a new gfx11 GPU released that was added to gfx11-generic and that
> required changes to the gfx11-generic code generation in order to
> function, then LLVM would increment the internal ISA version number [1].
> Using that newer version of LLVM, a request to build for gfx11-generic
> would build for gfx11-generic-v1.
>
> This version number is so that if you attempt to run an old gfx11-
> generic-v0 binary on that new and incompatible gfx11 GPU, the HIP
> Runtime would know that the code object is not compatible and would not
> load it. This would be resolved by rebuilding the binary for gfx11-
> generic on the newer compiler, which would output gfx11-generic-v1 code
> objects that the HIP Runtime would recognize as being compatible with
> that new hardware.
>
> In any case, the point of this is that I think the information that you
> care about is the compiler's full target name with the version number.
> This distinction doesn't matter yet, as we're not using generic targets
> on Debian yet and those are the only targets that have a version number.
> Also, I don't think LLVM has ever incremented a generic target version
> number yet. Nevertheless, it's something to consider for the future if
> we're designing for the long term.
While this is novel to me, I believe I can follow, and I agree that this
isn't a problem yet. Let's cross that bridge once we get there.
> It would be nice if this could be consistent across various different
> types of accelerators.
> Would it make sense to have one field that specifies the accelerator
> architectures for all vendors? Or would it make more sense to have a
> different field for each vendor / accelerator toolchain? e.g., X-
> Offload-Arch vs. X-<Vendor>-<Device Type/Runtime/Toolchain>-Arch?
I think a common solution will be unavoidable. I also think it'll take
years until we get there, because so much has to be aligned. All of our
policies and tools are entirely unaware of accelerators.
I always saw our work and our CI as the test balloon to gather data and
to inform the change process, and I still think it's the fastest way to
get there.
In that spirit, I think we should go ahead, try things out, improve upon
those, and then share what works.
We've done that for a while now, and so far the results speak for
themselves. I think what we've done will be a fantastic blueprint for
the other accelerators to follow.
>> It's debatable whether this should also be added to -dev packages.
>> I myself don't think this would contribute much, other than extra
>> maintenance work.
>
> I don't think it makes sense on them anyway, as they don't contain any
> GPU code [...]
Admittedly, the only reason I mentioned this is because B-Ds are always
for the -dev package, and having them there makes some resolutions simpler.
>> We could also use this list to "bridge" back to our CI. Does a package
>> pass all its tests on the listed ISAs -> otherwise, report a bug.
>
> Although, I suppose this idea implies a somewhat different
> interpretation of the field. You are not saying, "this is the ISA that
> the package was built for" but rather "these are the GPUs that the
> package supports". Those are very different things in the case of
> generic targets, the SPIR-V target, and run-time compilation. You'll
> need to be clear about which you mean.
Hm, interesting point. This is in the sense of a build targeting
gfx1030, but the package supporting gfx103<n>, right?
I believe what we should document what it was built for, as that is what
the centralized target list is actually used for.
And unless I'm mistaken, what GPUs the package supports doesn't need
per-package encoding, it could be inferred from the central build target
list and some other criteria.
Best,
Christian
Reply to: