[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Fwd: ROCm installation [layout]



P.S. There was an issue on deb-devel and deb-mentors whose attachment limit
is 100kB, sending again to avoid confusion... you can find the image at:
https://lists.debian.org/debian-ai/2022/01/png1edBgW0YQi.png

ROCm is AMD compute software stack, a competitor to nvidia CUDA.

I have asked several times on debian-mentors IRC already, and on the debian-ai
mailing list [1], but I hope that treating this topic systematically, rather
than incrementally, and on wide audience mailing lists rather than by chat
could benefit the packaging team, and maybe the stack aswell.

ROCm as a stack is still maturing and unifying, and it still feels
work-in-progress with regards to the build system. The glue of cmake is not
yet frozen, AMD is still improving on it.

"Native" Debian packages are starting to cover a significant portion of the
stack [2], and it would be great to figure out the installation topic once and
for all.

The goal of this mail is to get feedback from the Debian community on how such
a stack should be installed today.
The short-term objective is then to give this feedback to AMD with a somewhat "official recommendation for ROCm installation on Debian". Probably through AMD github? AMD asked for such feedback not too long ago to the packaging team [3].
In case a "spec" is approved, it will not diffuse immediately to the current
4.5.2 packages from the team [4], as I doubt anyone in the packaging team has
the energy to fight and patch the whole build system as it is now.
I see it more, as a short-term effort, of a way to align internally at Debian on how to install ROCm today. And second, as a mid-term benefit, we will maybe see the recommandations adopted and reflected in future AMD cmake code, which
would then make packaging even easier.


The installation options and paths generally looked for by CMake Lists/configs
are currently:
- various cmake project-specific flags for the install paths of the components   HIP_CLANG_PATH, HIP_DEVICE_LIB_PATH, HIP_PATH, ROCM_PATH, ... see [5] which   derives from Cordell Bloor's all-in-one install script [6]. The exhaustive
  list of whose I have a hard time finding [7], and AMD seems to be still
  iterating on it.
- /opt/rocm as a default backup

I see at least three choices, and sub-decisions to be made:
- Multi-arch or not
  nvidia toolkit supports aarch64 and a few others.
  Cross-compiling ROCm from Debian could be interesting in a near-future.
- Nested or not
  Other stacks and relatively important projects, such as postgresql or llvm go
  nested (there is a central /usr/lib/{llvm-13, postgresql} directory,
  often with a sub ./bin, ...)
- Where to install machine-readable GPU code
  There is at least 3 types of device-side (aka GPU) binary files -
    .bc for bitcode,
    .hsaco for HSA code object and
    .co for code object.
  Bitcode files are the machine readable form of the LLVM intermediate
  representation. HSA (Heterogeneous System Architecture) and other code object
  files are AMD containers for GPU machine code. PostgreSQL does use llvm
  bitcode files: since the install path is nested, they are in
  /usr/lib/postgresql/14/lib/bitcode.
  Since it is arch-independent in the sense of the CPU architecture, I have
  been proposed that such code should reside in /usr/share.

What I tried to keep in mind is that:
- shared libraries should be easily discoverable in paths looked by
  /etc/ld.so.conf
- there are only so much paths that cmake find_package in config mode
  looks for [8].

I attached as an image a direct comparison between some arbitrary combinations
of these decisions. The directories are bundled in the attached archive too.
- install_layout_proposal_v1 goes
  multi-arch, flattened, and with GPU code in /usr/share
- install_layout_proposal_v2 goes
  "ante-multi-arch", nested, and with GPU code in /usr/lib

You are welcome to correct the layout in these demo directories, and ship them
back to the mailing list!

Taken together, the Filesystem Hierarchy Standard, the Debian policy and the
Multi-arch policy leave room, to my knowledge, for both of the above proposals
to exist. If not formally, at least in the facts both install ways do exist,
for stacks of different scales.

Please forgive and correct any mistake that I made as I am still learning.
Best regards, Maxime

P.S. Thank you Andrey Rakhmatullin for the patience on IRC :)

[1] https://lists.debian.org/debian-ai/2021/12/msg00043.html
[2] https://salsa.debian.org/rocm-team/community/team-project#packaging-progress
[3] https://lists.debian.org/debian-ai/2021/11/msg00053.html
[4] https://salsa.debian.org/rocm-team
[5] https://salsa.debian.org/rocm-team/rocrand/-/blob/master/debian/rules
[6] https://gist.github.com/cgmb/7cd9a481c42ce132b5d6420380becef3
[7] https://github.com/RadeonOpenCompute/ROCm/issues/1655
[8] https://cmake.org/cmake/help/latest/command/find_package.html#config-mode-search-procedure

Attachment: install_layout_proposals.tar
Description: Unix tar archive


Reply to: