Re: HIP on the NVIDIA platform

To: "M. Zhou" <lumin@debian.org>, debian-ai@lists.debian.org
Subject: Re: HIP on the NVIDIA platform
From: Cordell Bloor <cgmb-deb@slerp.xyz>
Date: Thu, 27 Oct 2022 20:16:59 -0600
Message-id: <[🔎] 637b1b28-649e-c823-de24-ad8e6ccc246e@slerp.xyz>
In-reply-to: <[🔎] aa26faf49924c93e9484835f64b25f597c254f06.camel@debian.org>
References: <[🔎] 373a1a31-9481-7f5c-5f7c-6589cb05844f@slerp.xyz> <[🔎] ba65616f43131a3f11e32668968ea7e4d101d23c.camel@debian.org> <[🔎] 79c2d8ad-ab70-059e-b44a-040e0c996c97@slerp.xyz> <[🔎] aa26faf49924c93e9484835f64b25f597c254f06.camel@debian.org>

Hi Mo,

Upon further contemplation, I've realized that it's irrelevant whether the HIP math libraries maintain a compatible ABI between their AMD and NVIDIA platform variants. The HIP library itself is header-only on the NVIDIA platform and nearly all libraries and applications that depend on the hip math libraries will be using HIP API functions for managing memory.

On 2022-10-27 16:24, M. Zhou wrote:

Hi Cordell,

On Thu, 2022-10-27 at 16:10 -0600, Cordell Bloor wrote:

Thanks for your perspective. That was very informative.

I did a brief review of the source and I believe it's likely that
they 
are ABI compatible. However, ABI compatibility between the AMD and 
NVIDIA platform variants is not guaranteed by the upstream project.

I think there is a typo somewhere. Did you mean they are compatible
in terms of API? And since they use different underlying structures,
they result in different ABIs after compilation.

No typo. The HIP math libraries promise source-compatibility between variants, but not binary-compatibility. Nevertheless, it seems that some of them are ABI compatible in practice.

We'd have to check on a library-by-library basis, but for hipSOLVER the potentially problematic types in the API are always passed by pointer. All structures that differ in size or layout are heap allocated by factory functions and all fields are accessed through setter or getter functions. There are some types that are not allocated with factory functions, such as hipFloatComplex / cuFloatComplex and hipStream_t / cudaStream_t, but these appear to be compatible.

I am assuming that hipFloatComplex and cuFloatComplex (and similar) have the same size and layout. That seems like a reasonable assumption. They are both sometimes just typedefs for float2, but other definitions may be used depending on the language (C, C++, HIP, CUDA) and compiler (clang, gcc, nvcc). The hip math libraries provide a pure C API, so name mangling shouldn't be a problem.

In the case of hipStream_t (and similar), on the NVIDIA platform it is a typedef for cudaStream_t, which is itself a typedef for struct CUStream_st*. On the AMD platform, hipStream_t is a typedef for struct ihipStream_t*. Technically, it would be a violation of both the strict aliasing rule and the one definition rule (ODR) for an application built with hipStream_t defined as ihipStream_t* to call into a library built with hipStream_t defined as CUStream_st*. In practice, however, they're both just pointers to structs and would have the same size and layout. I would expect that the shared library boundary would provide a barrier to prevent wrong inferences by the compiler based on the rules that were violated.

This is all rather academic at this point, though.

Sincerely,
Cory Bloor

Reply to:

References:
- HIP on the NVIDIA platform
  - From: Cordell Bloor <cgmb-deb@slerp.xyz>
- Re: HIP on the NVIDIA platform
  - From: "M. Zhou" <lumin@debian.org>
- Re: HIP on the NVIDIA platform
  - From: Cordell Bloor <cgmb-deb@slerp.xyz>
- Re: HIP on the NVIDIA platform
  - From: "M. Zhou" <lumin@debian.org>

Prev by Date: Re: HIP on the NVIDIA platform
Next by Date: Bug#1023078: ITP: rocsparse -- ROCm library for sparse linear algebra
Previous by thread: Re: HIP on the NVIDIA platform
Next by thread: Processing of rocm-smi-lib_5.2.3-1~0exp1_source.changes
Index(es):
- Date
- Thread