[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: pytorch and CUDA



Hi,

On 2023-02-26 07:33, M. Zhou wrote:
I've managed to build the cuda version of pytorch on my ppc64el machine locally.
Some installation paths remain to be fixed, but are already minor issues.
Will work on uploading the remaining bits to the archive/NEW.

The latest code on the pytorch-cuda branch builds without any change.
But you may want to change TORCH_CUDA_ARCH_LIST in d/rules.

For your reference, the following local packages are still needed:

1. nvidia-cudnn (master branch, instead of the version in unstable)
     https://salsa.debian.org/nvidia-team/nvidia-cudnn

2. nvidia-cutlass (master branch, pending for NEW)
     https://salsa.debian.org/nvidia-team/nvidia-cutlass

3. nvidia-cudnn-frontend (master branch, pending for NEW)
     https://salsa.debian.org/nvidia-team/nvidia-cudnn-frontend

4. tensorpipe (cuda branch)
     https://salsa.debian.org/deeplearning-team/tensorpipe/-/tree/cuda

5. gloo is skipped for now, but we need a cuda version of it in the future.

nvidia-nccl is needed, but it has cleared the new queue very quickly.

Thanks a lot for the detailed instructions. I attempted building on amd64 and all the above (1-4) went on fine. However, torch build dies at dh_shlibdeps steps with the following:

dpkg-shlibdeps: error: no dependency information found for /lib/x86_64-linux-gnu/libcudnn.so.8 (used by debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorch_python.so.1.13)
Hint: check if the library actually comes from a package.
dh_shlibdeps: error: dpkg-shlibdeps -Tdebian/libtorch1.13.substvars debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libshm.so.1.13 debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorchbind_test.so debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorch_cpu.so.1.13 debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorch_cpu.so debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libc10.so.1.13 debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorch_global_deps.so debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libbackend_with_compiler.so debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libshm.so.1.13.0 debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorch_python.so.1.13 debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libc10_cuda.so debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorch.so.1.13.0 debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libc10.so.1.13.0 debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorch_python.so.1.13.0 debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorch_python.so debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libcaffe2_nvrtc.so debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libjitbackend_test.so debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libc10.so debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorch_cuda_linalg.so debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorch_cuda.so debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libshm.so debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorch.so debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorch_global_deps.so.1.13 debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libc10d_cuda_test.so debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorch.so.1.13 debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorch_cpu.so.1.13.0 debian/libtorch1.13/usr/lib/x86_64-linux-gnu/libtorch_global_deps.so.1.13.0 returned exit code 2

Do you know what might be happening here? Path /lib/x86_64-linux-gnu/libcudnn.so.8 seems unusual. I will give libraries in nvidia-cudnn a look.

Best,
Andrius


Reply to: