[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Enabling ROCm on Everything



Hi,

M. Zhou, on 2023-03-21:
> On Tue, 2023-03-21 at 19:41 +0100, Christian Kastner wrote:
> > I agree that a split of some sort will probably be necessary in the
> > short-to-midterm, and your proposed solutions looks reasonable to me.
> > 
> > One difficulty we will need to figure out one way or another is how to
> > actually bring the user to the right package. What do we do when the
> > user wants to `apt install pytorch-rocm`?
> 
> If possible, I suggest we stick to only one single binary package that
> supports multiple selected architectures. The pytorch-rocm popcon
> will not likely to be very large to deserve such a special treatment.
> Neither does pytorch-cuda.

I'm somewhat concerned that the sheer size of the library might
propagate to bigger .deb binary packages which in turn would
cause issues with the infrastructure.  I can't recall when I've
read it, but I do recall people mentionning problems once a .deb
exceeds the size of a gigabyte or two.  Splitting libraries
might thus end up being needed if such scenario were to occur.
Looking at Xorg userland video drivers, I even thought it might
be possible to mimick the layout of xserver-xorg-video-*
packages, which are pulled by default by xserver-xorg-video-all,
which in turn will get all the drivers for all the gpu out
there; people interested in just one gpu can remove the -all and
all other userland Xorg drivers.

On the other hand, the package compression algorithm seems to do
a good job of deduplicating the common segments of architecture
specific codes.  In the case of the librocsparse, I see almost a
factor eighteen of compression:

	$ du -sh librocsparse0_5.3.0+dfsg-3_amd64.deb 
	114M	librocsparse0_5.3.0+dfsg-3_amd64.deb

	$ du -sh librocsparse0_5.3.0+dfsg-3_amd64/usr/lib/x86_64-linux-gnu/librocsparse.so.0.1 
	2.0G	librocsparse0_5.3.0+dfsg-3_amd64/usr/lib/x86_64-linux-gnu/librocsparse.so.0.1

So perhaps this is a non-problem (at least regarding rocsparse,
but other components may prove to be more difficult if they are
much larger).

I'm not sure what to think.  Long term there will be a need
upstream to split the libraries when architectures will add up,
otherwise the model will not scale due to the issues pointed out
by Cory.  Short term the monolithic library is not good, but
fair enough, and splitting would introduce a number of issues
pointed out by Mo Zhou.

I was hoping to produce a more useful message, but so be it…
-- 
Étienne Mollier <emollier@emlwks999.eu>
Fingerprint:  8f91 b227 c7d6 f2b1 948c  8236 793c f67e 8f0d 11da
Sent from /dev/pts/2, please excuse my verbosity.

Attachment: signature.asc
Description: PGP signature


Reply to: