Re: Enabling ROCm on Everything

To: debian-ai@lists.debian.org
Subject: Re: Enabling ROCm on Everything
From: Christian Kastner <ckk@debian.org>
Date: Tue, 28 Mar 2023 00:00:59 +0200
Message-id: <[🔎] 2e2488d0-6811-b72b-95b9-e780436a1bdb@debian.org>
In-reply-to: <[🔎] dfca8802d01de712d3d7fb9c8a5c1038e3a0dbab.camel@riseup.net>
References: <[🔎] 099ceb80-9d22-6a82-d0b7-723fb69889af@slerp.xyz> <[🔎] dfca8802d01de712d3d7fb9c8a5c1038e3a0dbab.camel@riseup.net>

On 2023-03-27 05:44, M. Zhou wrote:
> I changed my mind. Now I agree with the fine-grained ROCm architecture split solution.
> When finalizing the pytorch-cuda packaging, I realized that it won't induce much
> burden to me if we will build python3-torch-rocm-{gfx900,gfx906,etc}. I have already
> prepared some code (for cuda variant) that is reusable for the rocm variants as well.

I'm wondering: do we even need GPU-arch-specific builds of pytorch?

I just did a test build of rocrand for (1) only gfx806 and for (2) only
gfx903, and it looks like both libraries built have identical ABI.

Cory, re-reading your initial mail, it reads like you are suggesting
that this is indeed the case (or am I reading to much into it?):

On 2023-03-21 06:17, Cordell Bloor wrote:
> One possible split would be on the GFX architecture major version. There would be binary packages for librocsparse0-gfx8, librocsparse0-gfx9, librocsparse0-gfx10, and librocsparse0-gfx11 with each providing librocsparse0.
If all GPU-arch-specific builds indeed share ABI, then pytorch etc.
don't need special builds. We could have pure meta-packages like
pytorch-rocm-gfx900 that depend on the correct implementation.

We could even go so far as to use the alternatives system to bridge
interface and implementation and thus make implementations
runtime-switchable, analogous to what Mo does for the BLAS
implementations. But I don't see a use case for that (yet).

Reply to:

Follow-Ups:
- Re: Enabling ROCm on Everything
  - From: "M. Zhou" <lumin@debian.org>

References:
- Enabling ROCm on Everything
  - From: Cordell Bloor <cgmb-deb@slerp.xyz>
- Re: Enabling ROCm on Everything
  - From: "M. Zhou" <lumin@debian.org>

Prev by Date: Re: uploading pytorch-cuda to experimental soon
Next by Date: Re: Re: Maintaining rocm-device-libs and rocm-comgr
Previous by thread: Re: Enabling ROCm on Everything
Next by thread: Re: Enabling ROCm on Everything
Index(es):
- Date
- Thread