Hi Mo and Christian,
BTW, do we have a meta package that pulls everything in?
Like nvidia-cuda-toolkit pulls everything, including the
docs, utils, and nvidia-cuda-dev. That -dev package pulls
most libraries including cublas, cusparse, etc.
No, we do not. It would be a good idea, though. Inexperienced users don't know what specific components they need, so a "kitchen sink" package that just installs everything may be necessary just to help people get started. We could probably close out the ancient ITP for "rocm-all" upon adding such a metapackage [1].
I would also like "rocm", but that could create problems on a system with both Debian (or Ubuntu) and upstream APT sources enabled, which for Ubuntu is a use case advertised by Upstream. Users would have to notice this, and select the correct "rocm" version. It seems that upstream "rocm" has a version number that will always be newer than ours, eg: upstream: 6.2.1.60201-112~22.04 us (eventually): 6.2.1 I looked at upstream's container [1] for other issues and at first glance, it seems that most [2] names would not conflict (eg: "rocrand" vs our "librocrand1"), and also would not conflict by path (upstream installs to /opt") so would be co-installable. So do we want to use "rocm" and accept some possible confusion and/or apt_preferences(5) trickery requirement, or should we go with "rocm-deb" or "rocm-dist" or similar. Likewise for -dev.
This has been a problem for rocminfo and rocm-smi since Ubuntu
22.04. It is also a problem for Red Hat users now that ROCm
packages are available in the EPEL. AMD's official installation
instructions for ROCm set package priorities to force the
installation of rocminfo and rocm-smi from the rocm repo, but I'm
not a fan of this solution. rocminfo installed to /usr/bin and
rocminfo installed to /opt/rocm are not really interchangeable.
When one installs into the system PATH and one does not, they are
fundamentally different packages and should have different names,
not just different versions.
AMD's amdgpu repository adds the suffix "-amdgpu" to all packages that it provides. It is my opinion that AMD's 'rocm' repo should do something similar. It would likely add the prefix "amd-" to all packages [2]. Jeremy Newton filed a bug to this effect several years ago, but it has been a low priority.
I think that Jeremy, Tom and I need to hammer on this issue with AMD's packaging teams. To prevent conflicts, it should really AMD's responsibility to choose a consistent naming convention for their repository.
Sincerely,
Cory Bloor
[1]: https://bugs.debian.org/1001712
[2]: I wonder if this might be an issue for some tools as packages
are split into different directories based on the first letter of
the package name? Or maybe there aren't enough packages for it to
matter...