Re: Finishing the ROCm 6 update : pytorch-rocm-6 build semi-success
- To: Christian BAYLE <bayle@debian.org>, debian-ai@lists.debian.org
- Subject: Re: Finishing the ROCm 6 update : pytorch-rocm-6 build semi-success
- From: "M. Zhou" <lumin@debian.org>
- Date: Fri, 28 Feb 2025 20:03:54 -0500
- Message-id: <[🔎] e14963a37b80c5bc1ec751fe9e848041c47b3597.camel@debian.org>
- In-reply-to: <f85dc039-4151-41e4-80fb-8a4940d3981b@debian.org>
- References: <0b69b800-db20-4032-9a87-d00fe342a447@slerp.xyz> <sa6seo6z93n.fsf@hjemme.reinholdtsen.name> <461b9785-466b-4268-b3f7-71c57d0eee82@debian.org> <66b517153e5d80a31ec54191447e8ff19d7ae7fe.camel@debian.org> <f85dc039-4151-41e4-80fb-8a4940d3981b@debian.org>
On Sat, 2025-03-01 at 00:46 +0100, Christian BAYLE wrote:
>
> I managed to build again
>
> Here is the merge request where I describe my issue
>
> https://salsa.debian.org/deeplearning-team/pytorch/-/merge_requests/6
Thanks a ton! Looks good to me. There are still some work remaining
to be done, but your changes are eventually needed. I've merged them.
>
> Is the
> export PYTORCH_ROCM_ARCH = gfx1102 ?
> instead of gfx1100 for my board,
> should we build for a list ?
> make a specific package per kind of board ?
According to Cory, the list should be
"gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102"
https://salsa.debian.org/science-team/magma/-/blob/master/debian/rules?ref_type=heads#L45
>
> Did I make an obvious mistake in my filling of the rocm missing part ?
Nothing missing. But in fact the work needed to really enable
ROCm is more than what we currently have. You hit a known problem
where the cmake fails to detect HIP and falls back to compiling the
CPU version. So it ends up with no CUDA/HIP support whatever architecture
you specify because that was never compiled.
the LoadHIP/FindHIP cmake files needs patching work.
--------------------------------------------------------------------
Building PyTorch for GPU arch: gfx1100
CMake Warning at cmake/public/LoadHIP.cmake:73 (find_package):
No "FindHIP.cmake" found in CMAKE_MODULE_PATH.
Call Stack (most recent call first):
cmake/public/LoadHIP.cmake:80 (find_package_and_print_version)
cmake/Dependencies.cmake:1014 (include)
CMakeLists.txt:861 (include)
CMake Warning (dev) at cmake/public/LoadHIP.cmake:73 (find_package):
FindHIP.cmake must either be part of this project itself, in this case
adjust CMAKE_MODULE_PATH so that it points to the correct location inside
its source tree.
Or it must be installed by a package which has already been found via
find_package(). In this case make sure that package has indeed been found
and adjust CMAKE_MODULE_PATH to contain the location where that package has
installed FindHIP.cmake. This must be a location provided by that package.
This error in general means that the buildsystem of this project is relying
on a Find-module without ensuring that it is actually available.
Call Stack (most recent call first):
cmake/public/LoadHIP.cmake:80 (find_package_and_print_version)
cmake/Dependencies.cmake:1014 (include)
CMakeLists.txt:861 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
HIP VERSION:
CMake Warning at cmake/Dependencies.cmake:1093 (message):
Not using CUDA/ROCM, so disabling USE_NCCL. Suppress this warning with
-DUSE_NCCL=OFF.
Call Stack (most recent call first):
CMakeLists.txt:861 (include)
Reply to: