Re: Enabling ROCm on Everything

To: debian-ai@lists.debian.org
Subject: Re: Enabling ROCm on Everything
From: Étienne Mollier <emollier@emlwks999.eu>
Date: Tue, 21 Mar 2023 21:34:51 +0100
Message-id: <[🔎] ZBoU6xVMzWER6PjF@fusion>
In-reply-to: <[🔎] 145cc757de9430e44224f3ced3cc97871a20cbe0.camel@riseup.net>
References: <[🔎] 099ceb80-9d22-6a82-d0b7-723fb69889af@slerp.xyz> <[🔎] 09555c5c-b64e-f9d1-0834-4f560c24e9cc@debian.org> <[🔎] 145cc757de9430e44224f3ced3cc97871a20cbe0.camel@riseup.net>

Hi,

M. Zhou, on 2023-03-21:
> On Tue, 2023-03-21 at 19:41 +0100, Christian Kastner wrote:
> > I agree that a split of some sort will probably be necessary in the
> > short-to-midterm, and your proposed solutions looks reasonable to me.
> > 
> > One difficulty we will need to figure out one way or another is how to
> > actually bring the user to the right package. What do we do when the
> > user wants to `apt install pytorch-rocm`?
> 
> If possible, I suggest we stick to only one single binary package that
> supports multiple selected architectures. The pytorch-rocm popcon
> will not likely to be very large to deserve such a special treatment.
> Neither does pytorch-cuda.

I'm somewhat concerned that the sheer size of the library might
propagate to bigger .deb binary packages which in turn would
cause issues with the infrastructure.  I can't recall when I've
read it, but I do recall people mentionning problems once a .deb
exceeds the size of a gigabyte or two.  Splitting libraries
might thus end up being needed if such scenario were to occur.
Looking at Xorg userland video drivers, I even thought it might
be possible to mimick the layout of xserver-xorg-video-*
packages, which are pulled by default by xserver-xorg-video-all,
which in turn will get all the drivers for all the gpu out
there; people interested in just one gpu can remove the -all and
all other userland Xorg drivers.

On the other hand, the package compression algorithm seems to do
a good job of deduplicating the common segments of architecture
specific codes.  In the case of the librocsparse, I see almost a
factor eighteen of compression:

	$ du -sh librocsparse0_5.3.0+dfsg-3_amd64.deb 
	114M	librocsparse0_5.3.0+dfsg-3_amd64.deb

	$ du -sh librocsparse0_5.3.0+dfsg-3_amd64/usr/lib/x86_64-linux-gnu/librocsparse.so.0.1 
	2.0G	librocsparse0_5.3.0+dfsg-3_amd64/usr/lib/x86_64-linux-gnu/librocsparse.so.0.1

So perhaps this is a non-problem (at least regarding rocsparse,
but other components may prove to be more difficult if they are
much larger).

I'm not sure what to think.  Long term there will be a need
upstream to split the libraries when architectures will add up,
otherwise the model will not scale due to the issues pointed out
by Cory.  Short term the monolithic library is not good, but
fair enough, and splitting would introduce a number of issues
pointed out by Mo Zhou.

I was hoping to produce a more useful message, but so be it…
-- 
Étienne Mollier <emollier@emlwks999.eu>
Fingerprint:  8f91 b227 c7d6 f2b1 948c  8236 793c f67e 8f0d 11da
Sent from /dev/pts/2, please excuse my verbosity.

Attachment: signature.asc
Description: PGP signature

Reply to:

Follow-Ups:
- Re: Enabling ROCm on Everything
  - From: Cordell Bloor <cgmb-deb@slerp.xyz>

References:
- Enabling ROCm on Everything
  - From: Cordell Bloor <cgmb-deb@slerp.xyz>
- Re: Enabling ROCm on Everything
  - From: Christian Kastner <ckk@debian.org>
- Re: Enabling ROCm on Everything
  - From: "M. Zhou" <lumin@debian.org>

Prev by Date: Re: Enabling ROCm on Everything
Next by Date: Re: Enabling ROCm on Everything
Previous by thread: Re: Enabling ROCm on Everything
Next by thread: Re: Enabling ROCm on Everything
Index(es):
- Date
- Thread