[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Joining ROCm team, ectrans



Hi Mo and Christian,

On 2024-10-12 09:53, Mo Zhou wrote:
BTW, do we have a meta package that pulls everything in?
Like nvidia-cuda-toolkit pulls everything, including the
docs, utils, and nvidia-cuda-dev. That -dev package pulls
most libraries including cublas, cusparse, etc.

No, we do not. It would be a good idea, though. Inexperienced users don't know what specific components they need, so a "kitchen sink" package that just installs everything may be necessary just to help people get started. We could probably close out the ancient ITP for "rocm-all" upon adding such a metapackage [1].

On 2024-10-12 11:25, Christian Kastner wrote:
I would also like "rocm", but that could create problems on a system
with both Debian (or Ubuntu) and upstream APT sources enabled, which for
Ubuntu is a use case advertised by Upstream. Users would have to notice
this, and select the correct "rocm" version.

It seems that upstream "rocm" has a version number that will always be
newer than ours, eg:

    upstream:        6.2.1.60201-112~22.04
    us (eventually): 6.2.1

I looked at upstream's container [1] for other issues and at first
glance, it seems that most [2] names would not conflict (eg: "rocrand"
vs our "librocrand1"), and also would not conflict by path (upstream
installs to /opt") so would be co-installable.

So do we want to use "rocm" and accept some possible confusion and/or
apt_preferences(5) trickery requirement, or should we go with "rocm-deb"
or "rocm-dist" or similar. Likewise for -dev.

This has been a problem for rocminfo and rocm-smi since Ubuntu 22.04. It is also a problem for Red Hat users now that ROCm packages are available in the EPEL. AMD's official installation instructions for ROCm set package priorities to force the installation of rocminfo and rocm-smi from the rocm repo, but I'm not a fan of this solution. rocminfo installed to /usr/bin and rocminfo installed to /opt/rocm are not really interchangeable. When one installs into the system PATH and one does not, they are fundamentally different packages and should have different names, not just different versions.

AMD's amdgpu repository adds the suffix "-amdgpu" to all packages that it provides. It is my opinion that AMD's 'rocm' repo should do something similar. It would likely add the prefix "amd-" to all packages [2]. Jeremy Newton filed a bug to this effect several years ago, but it has been a low priority.

I think that Jeremy, Tom and I need to hammer on this issue with AMD's packaging teams. To prevent conflicts, it should really AMD's responsibility to choose a consistent naming convention for their repository.

Sincerely,
Cory Bloor

[1]: https://bugs.debian.org/1001712
[2]: I wonder if this might be an issue for some tools as packages are split into different directories based on the first letter of the package name? Or maybe there aren't enough packages for it to matter...


Reply to: