Re: HIP on the NVIDIA platform

To: "M. Zhou" <lumin@debian.org>, debian-ai@lists.debian.org
Subject: Re: HIP on the NVIDIA platform
From: Cordell Bloor <cgmb-deb@slerp.xyz>
Date: Thu, 27 Oct 2022 16:10:23 -0600
Message-id: <[🔎] 79c2d8ad-ab70-059e-b44a-040e0c996c97@slerp.xyz>
In-reply-to: <[🔎] ba65616f43131a3f11e32668968ea7e4d101d23c.camel@debian.org>
References: <[🔎] 373a1a31-9481-7f5c-5f7c-6589cb05844f@slerp.xyz> <[🔎] ba65616f43131a3f11e32668968ea7e4d101d23c.camel@debian.org>

Hi Mo,

On 2022-10-19 19:45, M. Zhou wrote:

Assume that we have two hipFOOBAR libraries, one built against
ROCm, and one against CUDA. As long as the backend (ROCm/CUDA)
does not affect the ABI of the shared objects (I suppose so),
we can simply introduce an update-alternatives[1][2] entry
for the library.

[...]

As long as we don't mess up with nvidia EULA, this problem
is very easy to solve.

But once my assumption is broken -- say, the ABI significantly
differs between the two variants -- then I forecast there will
be a very small chance that someone in debian community will
be able to handle the whole dependency tree for the extra variant.

I did not investigate into ABI compatibility, but I'd like to
reveal the importance of it for the upstream.


Thanks for your perspective. That was very informative.

I did a brief review of the source and I believe it's likely that theyare ABI compatible. However, ABI compatibility between the AMD andNVIDIA platform variants is not guaranteed by the upstream project.

The problem is that on the NVIDIA platform, the HIP types used in theAPI are typedefs for NVIDIA types. If ABI compatibility were guaranteed,then the other platform variants would be constrained to follow NVIDIA'sABI and any changes they make to it. That could be difficult with twoplatform variants, but it might become impossible if there were three ormore. So, it's not clear to me if ABI compatibility can realistically beguaranteed between platforms while using native CUDA types on NVIDIA.

One alternative being considered upstream is providing a header-onlyimplementation for the NVIDIA platform. I was initially concerned aboutthat approach because it would preclude the use of those librariesthrough FFI on the NVIDIA platform. Nevertheless, I'm starting to comearound to the idea.


It's a difficult problem. :(

Sincerely,
Cory Bloor

Reply to:

Follow-Ups:
- Re: HIP on the NVIDIA platform
  - From: "M. Zhou" <lumin@debian.org>

References:
- HIP on the NVIDIA platform
  - From: Cordell Bloor <cgmb-deb@slerp.xyz>
- Re: HIP on the NVIDIA platform
  - From: "M. Zhou" <lumin@debian.org>

Prev by Date: Bug#1022889: ITP: rccl -- ROCm Communication Collectives Library
Next by Date: Re: HIP on the NVIDIA platform
Previous by thread: Re: HIP on the NVIDIA platform
Next by thread: Re: HIP on the NVIDIA platform
Index(es):
- Date
- Thread