Re: pytorch and CUDA
On Fri, 2023-02-24 at 16:02 +0200, Andrius Merkys wrote:
> Hi,
>
> On 2023-02-20 16:08, M. Zhou wrote:
> > That branch uses the same source as src:pytorch.
> > I really dislike duplicating the same source multiple times.
>
> OK, but I probably should use something other than gbp, as gbp complains:
>
> $ gbp buildpackage --git-ignore-branch
> gbp:info: Creating
> /home/andrius/debian-packages/pytorch_1.13.1+dfsg.orig.tar.gz
> gbp:error: Cannot find pristine tar commit for archive
> 'pytorch_1.13.1+dfsg.orig.tar.gz'
It's because Aron forgot to push the +dfsg pristine-tar. I've imported
that pristine tar from archive and pushed to the git repo.
To build the cuda variant locally, you will also need to rebuild the
following packages on your own:
~/sbuild-arch ppc64el \
--extra-package=../../nvidia-cudnn.pkg/ \
--extra-package=../../nvidia-nccl.pkg/ \
--extra-package=../../tensorpipe.pkg/ \
--extra-package=../../nvidia-cutlass.pkg/
all nvidia-* packages can be found under the nvidia-team.
The tensorpipe needs to be recompiled from the `cuda`
branch to enable cuda support.
src:gloo also needs to be rebuilt against cuda for cuda support,
but I chose to skip it by exporting USE_GLOO=OFF in d/rules
to reduce my workload.
Then everything is ready. I gone through this path on ppc64el,
and it ends up with linker error about linker overflow, possibly
due to the cuda fat binaries. Maybe I should get rid of some old
CUDA compute capacity like 3.X-5.X.
My ppc64el builder has got 8 cores and 16GB of RAM (+16GB swap).
The cpu version of pytorch takes about 1 hour to build. The cuda version
takes roughly 6 hours to build.
I have no amd64 device within my easy reach that is capable
of building this brutal thing -- amd64 is untested.
Reply to: