Working towards pytorch-rocm

To: ROCm Team <debian-ai@lists.debian.org>
Subject: Working towards pytorch-rocm
From: Cordell Bloor <cgmb@slerp.xyz>
Date: Mon, 3 Mar 2025 02:51:27 -0700
Message-id: <[🔎] 9a53a25a-e364-4f97-93d7-91150291bb06@slerp.xyz>

Hi folks,

I wanted to say that I'd have pytorch-rocm built today, but after asolid evening of fighting, I still haven't been able to do it. Theproblem is mainly that PyTorch is a moving target and while we hadeverything needed to package PyTorch 2.4.0, the updates made to PyTorch2.6.0 significantly increase the baseline required.

There are now a number of places where PyTorch assumes ROCm 6.0 orgreater, and while it's not that hard to patch in compatibility witholder versions of ROCm, there's quite a few of them. The dependency onhipBLASLt has also become deeper making it much more difficult to remove[1]. There is now also a hard dependency on composable_kernel [1].

It wasn't too hard to revert the composable_kernel additions, but I'mfailing on the hipBLASLt removal. It seems silly that hipBLASLt is arequired dependency when it's an enormous library and it only supports atiny subset of AMD GPUs. Still, if we finish packaging hipBLASLt, thenwe won't need to worry about patching it out. It may be worth consultingthe pytorch-rocm package in Fedora for patches, as they face some of thesame constraints [3].

I haven't been hacking directly on the PyTorch Debian package as that iseven more complex than just getting the upstream PyTorch repo building.I want to take things one step at a time. This is how I've been hackingon pytorch:


git clone --recursive https://github.com/pytorch/pytorch.git
cd pytorch

apt install python3-full python3-dev ninja-build cmake hipcclibhipblas-dev librocblas-dev libhipsolver-dev librocsolver-devlibhipfft-dev librocfft-dev libhipsparse-dev librocsparse-devlibrocthrust-dev librocprim-dev libhipcub-dev librccl-dev libmagma-rocm-dev

python3 -m venv venv3
source venv3/bin/activate
pip install -r requirements.txt
pip install mkl-static mkl-include
export USE_CUDA=0
export USE_ROCM=1
export USE_XPU=0
export USE_KINETO=0 # requires roctracer otherwise
export USE_CK_FLASH_ATTENTION=0
export _GLIBCXX_USE_CXX11_ABI=1
export PYTORCH_ROCM_ARCH=gfx906 # my gpu
export ROCM_PATH=/usr
export HIP_DEVICE_LIB_PATH="/usr/lib/llvm-17/lib/clang/17/amdgcn/bitcode"
export HIP_CLANG_PATH=/usr/bin # workaround Bug #1099404
export CXXFLAGS='-Wno-error' # gcc errors otherwise
python tools/amd_build/build_amd.py
# <edit files here>
python setup.py develop
# ninja -C build # to continue build after making changes

I got libgloo_hip.a and libc10_hip.so built, but I think the path togetting aten/ building involves putting more work into getting the ROCmstack updated. With that said, I might still try to get a patchedversion of PyTorch 2.5.0 building anyway, just because I think thatmight still be useful even if it cannot be included in Debian.


On 2025-03-02 15:41, Cordell Bloor wrote:

Today I'll file a proper bug for that issue and maybe do one finalupload of rocm-hipamd 5.7 to unstable.

My justification for doing that would have been to make it easier toproceed with pytorch-rocm. As it's not going to help with that, I'm notgoing to bother. Might as well just fix these bugs in the latest version.


Sincerely,
Cory Bloor

[1]: https://github.com/pytorch/pytorch/pull/120551

[2]:https://github.com/pytorch/pytorch/commit/3f3b692a00737c54a3e2948db5db493d40119854

[3]: https://src.fedoraproject.org/rpms/python-torch/tree/rawhide

Reply to:

Follow-Ups:
- Re: Working towards pytorch-rocm
  - From: "M. Zhou" <lumin@debian.org>

Prev by Date: Re: Finishing the ROCm 6 update
Next by Date: onednn_3.7.1+ds-1_source.changes ACCEPTED into unstable
Previous by thread: Bug#1099404: libamdhip64-dev: FindHIP.cmake detects incorrect HIP_CLANG_PATH
Next by thread: Re: Working towards pytorch-rocm
Index(es):
- Date
- Thread