[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: pytorch and CUDA



On Fri, 2023-02-24 at 16:02 +0200, Andrius Merkys wrote:
> Hi,
> 
> On 2023-02-20 16:08, M. Zhou wrote:
> > That branch uses the same source as src:pytorch.
> > I really dislike duplicating the same source multiple times.
> 
> OK, but I probably should use something other than gbp, as gbp complains:
> 
> $ gbp buildpackage --git-ignore-branch
> gbp:info: Creating 
> /home/andrius/debian-packages/pytorch_1.13.1+dfsg.orig.tar.gz
> gbp:error: Cannot find pristine tar commit for archive 
> 'pytorch_1.13.1+dfsg.orig.tar.gz'

It's because Aron forgot to push the +dfsg pristine-tar. I've imported
that pristine tar from archive and pushed to the git repo.

To build the cuda variant locally, you will also need to rebuild the
following packages on your own:

~/sbuild-arch ppc64el \
        --extra-package=../../nvidia-cudnn.pkg/ \
        --extra-package=../../nvidia-nccl.pkg/ \
        --extra-package=../../tensorpipe.pkg/ \
        --extra-package=../../nvidia-cutlass.pkg/

all nvidia-* packages can be found under the nvidia-team.
The tensorpipe needs to be recompiled from the `cuda`
branch to enable cuda support.

src:gloo also needs to be rebuilt against cuda for cuda support,
but I chose to skip it by exporting USE_GLOO=OFF in d/rules
to reduce my workload.

Then everything is ready. I gone through this path on ppc64el,
and it ends up with linker error about linker overflow, possibly
due to the cuda fat binaries. Maybe I should get rid of some old
CUDA compute capacity like 3.X-5.X.

My ppc64el builder has got 8 cores and 16GB of RAM (+16GB swap).
The cpu version of pytorch takes about 1 hour to build. The cuda version
takes roughly 6 hours to build.

I have no amd64 device within my easy reach that is capable
of building this brutal thing -- amd64 is untested.


Reply to: