[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Enabling ROCm on Everything



On Tue, 2023-03-28 at 00:00 +0200, Christian Kastner wrote:
> On 2023-03-27 05:44, M. Zhou wrote:
> > I changed my mind. Now I agree with the fine-grained ROCm architecture split solution.
> > When finalizing the pytorch-cuda packaging, I realized that it won't induce much
> > burden to me if we will build python3-torch-rocm-{gfx900,gfx906,etc}. I have already
> > prepared some code (for cuda variant) that is reusable for the rocm variants as well.
> 
> I'm wondering: do we even need GPU-arch-specific builds of pytorch?
> 
> I just did a test build of rocrand for (1) only gfx806 and for (2) only
> gfx903, and it looks like both libraries built have identical ABI.
> 
> Cory, re-reading your initial mail, it reads like you are suggesting
> that this is indeed the case (or am I reading to much into it?):

In fact, the GPU code architecture is not compiled and reflected in the ABI.
For example, if you compile a random C code for haswell and icelake targets,
you will get the same ABI, but different machine codes wrapped behind the ABI.

This should be the same for both CUDA and ROCm.

> On 2023-03-21 06:17, Cordell Bloor wrote:
> > One possible split would be on the GFX architecture major version. There would be binary packages for librocsparse0-gfx8, librocsparse0-gfx9, librocsparse0-gfx10, and librocsparse0-gfx11 with each providing librocsparse0.
> If all GPU-arch-specific builds indeed share ABI, then pytorch etc.
> don't need special builds. We could have pure meta-packages like
> pytorch-rocm-gfx900 that depend on the correct implementation.
> 
> We could even go so far as to use the alternatives system to bridge
> interface and implementation and thus make implementations
> runtime-switchable, analogous to what Mo does for the BLAS
> implementations. But I don't see a use case for that (yet).

The update-alternative solution cannot be used in this case. BLAS
and LAPACK can use it because whatever the default option is, the
library is expected to work well on the user's CPU. BLAS libraries
has implemented dynamic dispatchers to correctly use the fastest
intrinsics that the CPU is capable to run. The user can adjust
the alternatives in order to tweak performance and behavior.

On the ROCm side, it becomes the user has to install a pile of junk
(that is not compatible with the presented GPU), and then manually
choose one version that seems working. This design does not make
sense.

Choice should be made already when the user types the apt install
command. A meta package is also introducing the unnecessary
overhead to determine which gfx architecture to pull. Similar attemps
like bumping the ISA baseline to AVX, AVX2, etc. on the CPU side
has already demonstrated its difficulty and practical issues. Let alone
the difficulty of implementing that on our own for GPUs.

My recommendation is, whatever packaging granularity should be
fine (both the coarse grained -gfx9, -gfx10, or the fine-grained
-gfx906, -gfx1030, etc). We keep a tight dependency chain that
a -gfx906 package will only depend and reverse depend on a
-gfx906, and will conflicts+replaces itself-gfx<other>.

This is achievable with a very small amount of code. And the user
has made the GPU architecture choice at the `apt install` stage.


Reply to: