[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: HIP on the NVIDIA platform



On Wed, 2022-10-19 at 14:20 -0600, Cordell Bloor wrote:
> 
> What limitations might we run into if we were to build and package
> the 
> NVIDIA variant of hipRAND? I presume that the binary package for that
> variant would have to be uploaded to the non-free archive. It would
> also 
> need a package name to distinguish it from the AMD platform variant.
> 
> This is not an urgent topic, as my priority is getting them working
> on 
> the AMD platform first. Nevertheless, I wanted to start thinking
> about 
> NVIDIA support so I could keep it in mind as I'm working on the other

Apart from my previous complains about the nvidia EULA in another mail,
the two variant co-existence problem is easy to solve.

Assume that we have two hipFOOBAR libraries, one built against
ROCm, and one against CUDA. As long as the backend (ROCm/CUDA)
does not affect the ABI of the shared objects (I suppose so),
we can simply introduce an update-alternatives[1][2] entry
for the library.

For example, the current ROCm package install the lib as
  /usr/lib/<triplet>/libhipFOOBAR.x.y.z
To introduce the alternatives system, we move the library to
  /usr/lib/<triplet>/hip-rocm/libhipFOOBAR.x.y.z
And install the new cuda variant in another package as
  /usr/lib/<triplet>/hip-cuda/libhipFOOBAR.x.y.z
Then we use update-alternative to maintain a symlink
  /usr/lib/<triplet>/libhipFOOBAR.x.y.z -> ?
to point to either of them.

An advantage is that reverse dependencies built against one
variant can switch to another variant seamlessly at run-time
without rebuild.

If there are multiple libraries, we can group them into
one alternatives entry and switch them simultaneously.
(to avoid mixing rocm and cuda usage)

As long as we don't mess up with nvidia EULA, this problem
is very easy to solve.

But once my assumption is broken -- say, the ABI significantly
differs between the two variants -- then I forecast there will
be a very small chance that someone in debian community will
be able to handle the whole dependency tree for the extra variant.

I did not investigate into ABI compatibility, but I'd like to
reveal the importance of it for the upstream.

[1] https://man7.org/linux/man-pages/man1/update-alternatives.1.html
    A good example of update-alternatives is debian's BLAS/LAPACK
    libraries:
    https://wiki.debian.org/DebianScience/LinearAlgebraLibraries


Reply to: