Re: RFC: ROCm Metapackages
On 2025-05-22 07:25, Cordell Bloor wrote:
> On 2025-05-21 12:20, Christian Kastner wrote:
>> Yes, but a bit more complicated than the above because -tests package
>> names match the SONAME of the library they were built against (which is
>> good).
> Is handling the SONAMEs for rocm-tests any more complicated than
> specifying each of those explicitly in the rocm-tests depends list? I
> presume we'd have to update that list manually anyway.
The handling of the metapackage itself, no. It's more about the user
expectation, eg: which version of rocblas will I get when I install
rocm-tests?
This isn't an issue with (unversioned) -dev packages as currently only
one can exist, but eg: librocblas0 and librocblas4 can co-exist.
Concrete example: a user might be experiencing a problem with an
application depending on a buggy rocblas0, but if rocm-tests depends on
rocblas4 where this issue has been fixed, the bug won't be found this way.
By the way, I'm just pointing out corner cases; that doesn't mean they
need to be solved. There's a "good enough" somewhere.
>> This meta-package should also provide a utility to run all of the tests.
>> This would be quite simple if -tests packages switched to the unified
>> interface that I proposed here [1], which wouldn't be much work.
>
> I like your proposal, though would it run afoul of the new Debian Policy
> [2]?
>
>> Two different packages must not install programs with different
> functionality but with the same filenames. This also applies when they
> are installed into different directories on the default (user or root) PATH.
I think that could be worded better but that part of the Policy is about
resolving naming conflicts between unrelated binaries, see [3] for
example.
In my proposal [1], we couldn't have such a conflict because all files
install to unique destinations that are also not on PATH, so there can
be no confusion on invocation. (Incidentally, this is why I argued for
libexec/rocm/lib<SONAME>-tests back then, because even different SOVER
of the same library couldn't conflict on co-installation.)
In the solution with the uniform interface I propose, each library would
have a runtests script, eg:
/usr/libexec/rocm/librocrand1-tests/runtests
/usr/libexec/rocm/libhiprand1-tests/runtests
/usr/libexec/rocm/librocblas0-tests/runtests
...
And the metapackage would have a 'runtests' somewhere that simply
iterates over the individual runtests above, maybe captures output and
produces summaries, etc. This metapackage-runtest could even be on PATH
if named appropriately, eg: rocm-runtests or whatever.
(Note that this uniform interface would also be of significant benefit
to any non-autopkgtest-based CIs for .debs, if AMD runs some of those
upstream, for example.)
>>> At some point, we may wish to have a rocm-doc metapackage too
>> Another meta-package to consider would be simply rocmX.Y with all the
>> libraries, no hipcc and whatever. For when a user has a ROCm-needing
>> binary from somewhere outside Debian, and no need for -dev stuff.
>
> I'm not sure I understand how this would work. We'd only have one
> version of ROCm available on Debian at once, right?
That was the original goal but IIRC that was because it was the simplest
solution, one that we could always improve upon later.
But if AMD upstream now commits to supporting ROCmX.Y for N years, then
that is most likely because users/customers requested that, and thus
it's not unlikely that those among the users/customers installing from
Debian or Ubuntu because "ROCm-in-box" would expect to have ROCmX.Y
there, too.
Concrete example: Say some reverse dependency (eg: PyTorch) only
officially supports ROCm up to X.Y. If we only keep one version and also
update to the latest X.Y+1, then we'd break those reverse dependencies.
> If users would only be able to install one specific rocmX.Y at any
> given time, would that really be useful? I think I must be
> misunderstanding how this would work.
The packaging process would need to be adapted to support more than one
version of components. Similar to how you can have multiple LLVMs in one
Debian release. Or multiple versions of the Boost libraries.
Imagine that instead of src:rocrand, we'd have src:rocrandX.Y.
And instead of building bin:librocrand1, these would build
bin:librocrand1-X.Y.
Then,
* rocm6.3 would depend on librocrand1-6.3
* rocm6.4 on librocrand1-6.4
* rocm on the most recent rocmX.Y, here rocm6.4
* (Then you'd also need rocm-X.Y-dev, and so on)
That would make ROCM 6.3 and 6.4 co-available, but not co-installable.
The latter would need some more tricks, but users would at least have a
choice which version to install.
Also, I wouldn't think about just one release. What if users/customers
want backports.
That was just food for thought. If I misunderstood AMD's intentions of
supporting ROCmX.Y for longer periods, and instead only the newest
release, then this is moot. If it wasn't a misunderstanding, then it
should be kept in mind as a goal for 26.04 and forkie (=trixie+1).
This involves extra work but is still doable, if AMD commits the
necessary resources to this.
Best,
Christian
[1]: https://lists.debian.org/debian-ai/2025/04/msg00173.html
> [2]: https://www.debian.org/doc/debian-policy/ch-files.html#binaries
[3]: https://lists.debian.org/debian-devel/2024/04/msg00368.html
Reply to: