Re: Enabling ROCm on Everything
On 2023-03-28 01:52, M. Zhou wrote:
> On Tue, 2023-03-28 at 00:00 +0200, Christian Kastner wrote:
>> We could even go so far as to use the alternatives system to bridge
>> interface and implementation and thus make implementations
>> runtime-switchable, analogous to what Mo does for the BLAS
>> implementations. But I don't see a use case for that (yet).
>
> The update-alternative solution cannot be used in this case. BLAS
> and LAPACK can use it because whatever the default option is, the
> library is expected to work well on the user's CPU. BLAS libraries
> has implemented dynamic dispatchers to correctly use the fastest
> intrinsics that the CPU is capable to run. The user can adjust
> the alternatives in order to tweak performance and behavior.
That's why I meant I don't see a use case yet :)
> Choice should be made already when the user types the apt install
> command. A meta package is also introducing the unnecessary
> overhead to determine which gfx architecture to pull. Similar attemps
> like bumping the ISA baseline to AVX, AVX2, etc. on the CPU side
> has already demonstrated its difficulty and practical issues. Let alone
> the difficulty of implementing that on our own for GPUs.
With metapackage, I was referring to pytorch. You mentioned earlier:
On 2023-03-27 05:44, M. Zhou wrote:
> For me, building these multiple ROCm pytorch variants can be simpler than building
> pytorch-cuda. The pytorch-rocm can be built using the Debian infrastructures because
> the dependencies are free. While the pytorch-cuda build have to be offloaded to my
> own machine or any unofficial builder due to the non-free dependencies.
If the ROCm libraries built for various GPU architectures are all ABI
compatible, a build for each architecture shouldn't be necessary to
achieve python3-torch-rocm-{gfx900,gfx906,etc}, if we even want it. Or
perhaps I misunderstood you?
> My recommendation is, whatever packaging granularity should be
> fine (both the coarse grained -gfx9, -gfx10, or the fine-grained
> -gfx906, -gfx1030, etc). We keep a tight dependency chain that
> a -gfx906 package will only depend and reverse depend on a
> -gfx906, and will conflicts+replaces itself-gfx<other>.
Yeah, that makes sense. And all provide a virtual package without the
-gfx<N> suffix, as Cory suggested in his initial email.
Best,
Christian
Reply to: