Re: Enabling ROCm on Everything

To: "M. Zhou" <lumin@debian.org>, debian-ai <debian-ai@lists.debian.org>
Subject: Re: Enabling ROCm on Everything
From: Cordell Bloor <cgmb-deb@slerp.xyz>
Date: Tue, 21 Mar 2023 18:31:41 -0600
Message-id: <[🔎] 5ddf28fd-d279-7357-ce98-7b5a8773fde3@slerp.xyz>
In-reply-to: <[🔎] eca176fc4f2e859aca30dc0d5cfffd9183213854.camel@riseup.net>
References: <[🔎] 099ceb80-9d22-6a82-d0b7-723fb69889af@slerp.xyz> <[🔎] eca176fc4f2e859aca30dc0d5cfffd9183213854.camel@riseup.net>


On 2023-03-21 12:41, Christian Kastner wrote:

One difficulty we will need to figure out one way or another is how to
actually bring the user to the right package. What do we do when the
user wants to `apt install pytorch-rocm`?

Maybe it should be `apt install pytorch-rocm-gfx<N>`? The user alreadyneeds to know their hardware to choose between pytorch-cuda,pytorch-rocm and pytorch-oneapi. It is a more burdensome to ask the userto be more specific about their hardware than just specifying thevendor, but that seems more like a matter of degree than a fundamentaldifference.

Another difficulty we might need to consider is: what if the system in
question contains multiple GPU architectures (e.g. 6800 XT and 7900 XT)?

I think the sad truth is that it's not technically feasible for Debianto handle every possible hardware configuration. The solution I proposehandles all single-GPU systems and many systems with a combination ofGPUs, but it wouldn't handle the specific case that you mentioned.

I suppose if the -gfx10 and -gfx11 packages installed to someplace like/usr/lib/<host-target>/<device-target>/libfoo.so, then you could useenvironment variables like LD_LIBRARY_PATH and ROCR_VISIBLE_DEVICES touse the GPUs separately. You would not be able to have both devicesvisible in the same process because the HIP runtime will throw an errorif you do not have kernels for all visible devices.

Users with more esoteric needs should probably be referred to a morecustomizable package management tool. That sort of thing is a good usecase for Spack [1]. It builds packages from source and is thus muchslower than installing with apt, but it can handle much more complexcustomization. `spack install <package> amdgpu_target==gfx1030,gfx1100`will build the libraries you need for that configuration.


On 2023-03-21 13:58, M. Zhou wrote:

There is exactly the same issue for pytorch-cuda. The upstream distributed
binaries put all supported cuda architectrures into a single fat binary,
and will cause linker error (file too large). The have some workarounds
like splitting the shared object into multiple ones, but the overall
binary size is still growing.

However, as long as the cuda compute architectures are backward-compatible,
we can just build several selected architectures that will work in most cases.
[...]
If the same backward compatibility applies to the gfx architectures,
then I'd suggest building only several selected archs by default.

In general, there is no compatibility between the GFX ISAs. If you wereto drop an ISA from the fat binary, it wouldn't mean reduced performanceon the hardware matching that ISA. It would mean completely droppingsupport for that hardware. While CUDA compiles to PTX bytecode, HIPcompiles to machine code. There is no hardware abstraction layer to hidethe differences between processors.

One possible split would be on the GFX architecture major version. There
would be binary packages for librocsparse0-gfx8, librocsparse0-gfx9,
librocsparse0-gfx10, and librocsparse0-gfx11 with each providing
librocsparse0. The GFX9 grouping would be pretty large with six
architectures, but that's still within acceptable limits. If need be, it
could be split into gfx9-gcn (gfx900, gfx904, gfx906) and gfx9-cnda
(gfx908, gfx90a, gfx940).

Theoretically this is an clean and elegant solution. But I forecast that
we don't have enough people to work on and maintain the sophisticated
dependency tree.

BTW, it will also result in very frequent entering to NEW queue, which
will drastically block the development process.

It would result in a trip to the new queue each time a new binarypackage is added, which would occur whenever we add a package for a newGFX major version. However, that could only occur after (1) a newgeneration of hardware is released, and (2) a new major version of LLVMis packaged.

If we look at this history of new architecture major versions, GFX9 wasintroduced with Vega in 2017, GFX10 was introduced with RDNA1 in 2019,and GFX11 was introduced with RDNA3 in 2022. I'm not sure what is the'normal' frequency for packages going through NEW, but every coupleyears doesn't seem that bad.

Also, I think we'd introduce this sort of packaging change at the sametime as updating to ROCm 6.0. The ABI changes in that release willnecessitate a trip through the new queue anyway.

One single fat binary looks to cause the smallest overhead to human.
I really don't care about the overhead to machines even if there will
be some performance loss. Whatever solution that induces the least
amount of burden to human is the best choice for long term
maintenance.

As far as I know, a single fat shared object library is not technicallypossible while supporting all architectures. A single binary packagewith multiple shared libraries might be possible, but the totalinstalled size would be enormous.

I can provide some technical suggestions on the implementation of the
package split. But before that, I'd suggest we think twice about whether
it induces more cost to human, for instance:

1. will this significantly increase my working hour for the next time of update?
2. will another contributor be able to grasp the whole thing in short time?

This proposal would significantly increase the time required to updatethe libraries. If nothing else, expanding the architecture support wouldsignificantly increase the time required to build. Whether it would bedifficult for another contributor to grasp, I'm not sure.


On 2023-03-21 14:07, M. Zhou wrote:

OK. Although I think most users will still use anaconda (including myself),
we can only see the popcon data after the upload.

As far as I know, there is no binary distribution that would have AMDGPU hardware support as wide-ranging as Debian. I think it would bequite a draw if Debian provided them.


Sincerely,
Cory Bloor

[1]: https://spack.io/

Reply to:

Follow-Ups:
- Re: Enabling ROCm on Everything
  - From: "M. Zhou" <lumin@debian.org>

References:
- Enabling ROCm on Everything
  - From: Cordell Bloor <cgmb-deb@slerp.xyz>
- Re: Enabling ROCm on Everything
  - From: "M. Zhou" <lumin@debian.org>

Prev by Date: Re: Enabling ROCm on Everything
Next by Date: Re: Enabling ROCm on Everything
Previous by thread: Re: Enabling ROCm on Everything
Next by thread: Re: Enabling ROCm on Everything
Index(es):
- Date
- Thread