[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: HIP on the NVIDIA platform



Hi Mo,

On 2022-10-19 19:45, M. Zhou wrote:
Assume that we have two hipFOOBAR libraries, one built against
ROCm, and one against CUDA. As long as the backend (ROCm/CUDA)
does not affect the ABI of the shared objects (I suppose so),
we can simply introduce an update-alternatives[1][2] entry
for the library.

[...]

As long as we don't mess up with nvidia EULA, this problem
is very easy to solve.

But once my assumption is broken -- say, the ABI significantly
differs between the two variants -- then I forecast there will
be a very small chance that someone in debian community will
be able to handle the whole dependency tree for the extra variant.

I did not investigate into ABI compatibility, but I'd like to
reveal the importance of it for the upstream.

Thanks for your perspective. That was very informative.

I did a brief review of the source and I believe it's likely that they are ABI compatible. However, ABI compatibility between the AMD and NVIDIA platform variants is not guaranteed by the upstream project.

The problem is that on the NVIDIA platform, the HIP types used in the API are typedefs for NVIDIA types. If ABI compatibility were guaranteed, then the other platform variants would be constrained to follow NVIDIA's ABI and any changes they make to it. That could be difficult with two platform variants, but it might become impossible if there were three or more. So, it's not clear to me if ABI compatibility can realistically be guaranteed between platforms while using native CUDA types on NVIDIA.

One alternative being considered upstream is providing a header-only implementation for the NVIDIA platform. I was initially concerned about that approach because it would preclude the use of those libraries through FFI on the NVIDIA platform. Nevertheless, I'm starting to come around to the idea.

It's a difficult problem. :(

Sincerely,
Cory Bloor


Reply to: