[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: CMake musings for ROCm 4.5.2 (Was Re: ROCm packaging session memo)



I'm currently in a minor role in committee to help determine and fix general install
related issues with ROCm. Some things I have strongly expressed so far:

- we should use gnuinstalldirs whenever possible
- we should respect the FHS for all components
- we should not make distro packaging harder

I will try to sift through your email and see what I can do from my side.

For reference, right now my involvement in ROCm is very high level, focused mostly on licensing, installation, and packaging related to runtime (rocr/compiler/rocclr/ocl/hip/hipamd). I'm really not well informed with the topics related to math libs and the misc roc* libs (like rocBLAS, rocSOLVER, etc) that Cory is more involved in.

I think itemized lists help a lot, because I can share them as-is.


On January 28, 2022 7:23:05 p.m. EST, Maxime Chambonnet <maxzor@maxzor.eu> wrote:
Hello,

On 1/27/22 21:15, Étienne Mollier wrote:
Maxime hit issues on the CMakeLists.txt of rocblas, something having to do with
GNUInstallDirs, to get it to have symlinks properly behave in FHS context.

As evoked, I come back to you on the topics of cmake & GNUInstallDirs,
and also on the code related to finding the device-libs install (without
discussing where it should be installed again).

The first topic is current cmake, GNUInstallDirs and the rocm-cmake
repo/package.
Not sure if the image will make it as an attachment, so I uploaded it
at Debian team-project repo [1]. You can see on the left a proposal,
which uses minimal "modern" cmake. This is for ROCclr, I tweaked its
cmake code to install a stub of a standalone package for demonstration,
and you can see the cmake diff on the background at the left. It makes
use of install(DIRECTORY)- I have been told that in a short-term-release
cmake will have "File sets" to ease this [2]. My cmake uses quite sheer
GNUInstallDirs, and all falls automagically in the right buckets on
Debian. I have not tested for other distributions. Gentoo devs told me
that they are perfectly fine with vanilla Cmake-GNUInstallDirs.
I bought the CMake book two weeks ago, so I'm still terrible at cmake,
and many things will go over my head: don't look too much at the
install(EXPORT) and things :)

On the right, you can see rocBLAS as it is now.
First, in library/CMakeLists.txt, GNUInstallDirs is included and its
variables CMAKE_INSTALL_{BINDIR, LIBDIR, INCLUDEDIR} (there exists
others [3]) are referred-to, which seems nice. But in the children
CMakeLists in library/src, which handles the actual install, the
aforementioned GNUInstallDirs are crippled and dropped for a mix of
CMAKE_SOURCE_DIR and hard-coded paths. Installing a package in
/usr/<package> is considered poor practice in Debian-based distribs.
I can imagine this nesting under /usr/rocrand + symlinking to /usr
classical sub-dirs, an operation being a remnant of the day that
/opt/rocm was the main assumption?
As I see it today, rocm-cmake is a gross over-complexity. I must be
short-sighted and you might have good reasons, (on top of the CPackDeb
infrastructure which has root in this repo) such as installations on
Windows, or maybe other Linux distros, for its existence?
I can imagine that much cmake could be cleaned if this was the case!


For the second topic of this mail, trying to overview which code tries
to find the device library installation:
- CMake, such as hipamd/hip-config.cmake.in with a hard-coded ./lib
  assumption [4].
- HIPCC, currently a Perl script, which has strong assumptions.
- AMDGPU upstream LLVM backend itself. It relies on the envvar
  HIP_DEVICE_PATH [5], clang can be passed the --hip-device-lib-path
  flag (as the above hip-config does), and ultimately it falls back to
  hard-coded [6] paths (where /usr/share, if agreed upon, would need to
  be added).

As far as cmake variables go, I think that they should be documented
outside of code to help reduce the overlap [7]. Just for the device lib
variables, there are at least:
- DEVICE_LIB_PATH
- HIP_DEVICE_LIB_PATH
- DEVICE_BITCODE
- AMD_DEVICE_LIBS_PREFIX
There are other cmake variables that are needed in this specific 4.5.2
ROCm, for device lib in /usr/share to be found, such as HIP_LIB_PATH.
This is quite the spaghetti dish de la mama. Here is what made Rocrand
finally compile here with device lib [8], currently having issues again
with rocSOLVER and rpath + isystem flags.
My personal, maybe short-sighted there too, recommendation would be,
since device library is not packaged with llvm, to leave it to
distributions to input the device lib path for now, with one clean
cmake (-DHIP_DEVICE_LIB_PATH?) variable, that gets passed to clang with
--hip-device-lib-path, and cut all the other assumptions out.

In the last jitsi meeting, we were joking about how Securitas could
make a visit to improve Debian workflows: a senior Kitware consultant
could maybe have an useful holistic(tm) CMake overhauling mission at
AMD ROCm team? ROCm is an awesome piece of software, it is sad seing it
being held back by its build system.

I would have enjoyed making this mail less rough, but I lost the
three/thirds of the day working around this classical git eol issue [9]
which abides bad with the debian packaging tooling...

Best regards, Maxime

P.S. Linking again the first thread were this was started being discussed
here [10].

[1]
https://salsa.debian.org/rocm-team/community/team-project/-/blob/86469d17988570a9cc674c242d70563d4600d16b/resources/cmake_musings_20220128_final.png
[2] https://gitlab.kitware.com/cmake/cmake/-/issues/23131
[3] https://cmake.org/cmake/help/latest/module/GNUInstallDirs.html
[4]
hhttps://github.com/ROCm-Developer-Tools/hipamd/blob/c681345d78600325ac7db92156ee7829ac50b695/hip-config.cmake.in#L240
[5]
https://github.com/llvm/llvm-project/blob/eb933225f491e2063ba47e18a1153829cb32676c/clang/lib/Driver/ToolChains/AMDGPU.cpp#L360
[6]
https://github.com/llvm/llvm-project/blob/eb933225f491e2063ba47e18a1153829cb32676c/clang/lib/Driver/ToolChains/AMDGPU.cpp#L411
[7] https://github.com/RadeonOpenCompute/ROCm/issues/1655
[8]
https://salsa.debian.org/rocm-team/rocrand/-/blob/3bd09d6052379a5f26bf069169f0e5b0dc095440/debian/rules#L13
[9] https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/issues/140
[10] https://lists.debian.org/debian-ai/2022/01/msg00031.html

Reply to: