[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Finishing the ROCm 6 update : pytorch-rocm-6 build semi-success



On Sat, 2025-03-01 at 00:46 +0100, Christian BAYLE wrote:
> 
> I managed to build again
> 
> Here is the merge request where I describe my issue
> 
> https://salsa.debian.org/deeplearning-team/pytorch/-/merge_requests/6

Thanks a ton! Looks good to me. There are still some work remaining
to be done, but your changes are eventually needed. I've merged them.

> 
> Is the
> export PYTORCH_ROCM_ARCH = gfx1102 ?
> instead of gfx1100 for my board,
> should we build for a list ?
> make a specific package per kind of board ?

According to Cory, the list should be
"gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102"
https://salsa.debian.org/science-team/magma/-/blob/master/debian/rules?ref_type=heads#L45
> 
> Did I make an obvious mistake in my filling of the rocm missing part ?

Nothing missing. But in fact the work needed to really enable
ROCm is more than what we currently have. You hit a known problem
where the cmake fails to detect HIP and falls back to compiling the
CPU version. So it ends up with no CUDA/HIP support whatever architecture
you specify because that was never compiled.

the LoadHIP/FindHIP cmake files needs patching work.


--------------------------------------------------------------------
Building PyTorch for GPU arch: gfx1100
CMake Warning at cmake/public/LoadHIP.cmake:73 (find_package):
  No "FindHIP.cmake" found in CMAKE_MODULE_PATH.
Call Stack (most recent call first):
  cmake/public/LoadHIP.cmake:80 (find_package_and_print_version)
  cmake/Dependencies.cmake:1014 (include)
  CMakeLists.txt:861 (include)


CMake Warning (dev) at cmake/public/LoadHIP.cmake:73 (find_package):
  FindHIP.cmake must either be part of this project itself, in this case
  adjust CMAKE_MODULE_PATH so that it points to the correct location inside
  its source tree.

  Or it must be installed by a package which has already been found via
  find_package().  In this case make sure that package has indeed been found
  and adjust CMAKE_MODULE_PATH to contain the location where that package has
  installed FindHIP.cmake.  This must be a location provided by that package.
  This error in general means that the buildsystem of this project is relying
  on a Find-module without ensuring that it is actually available.

Call Stack (most recent call first):
  cmake/public/LoadHIP.cmake:80 (find_package_and_print_version)
  cmake/Dependencies.cmake:1014 (include)
  CMakeLists.txt:861 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

HIP VERSION: 
CMake Warning at cmake/Dependencies.cmake:1093 (message):
  Not using CUDA/ROCM, so disabling USE_NCCL.  Suppress this warning with
  -DUSE_NCCL=OFF.
Call Stack (most recent call first):
  CMakeLists.txt:861 (include)



Reply to: