ROCm 6.2 will need Comgr patches for extended ISA compatibility
Hi all,
I'm the maintainer for the ROCm stack for Solus OS, and was
previously involved in improvements in ISA compatibility [1].
Solus's ROCm stack includes PyTorch and supports almost all
consumer GPUs with architecture gfx9* and gfx1* out-of-the-box,
thanks to a set of patches for extended ISA compatibility
mentioned in [1]
that originated from Debian (thank you Cory!).
While updating Solus's ROCm stack from v6.0.2 to v6.2.4, I noticed
that Compressed Code Object Bundle (CCOB) is being turned on for
composable_kernel and MIOpen, causing regression in the extended
ISA compatibility. My understanding is that when loading a CCOB,
unlike when loading a regular bundle, Comgr doesn't scan through
the provided archs (which gives us a chance to say which archs are
okay to use despite not being equivalent); instead, it just asks
ClangOffloadBundler to extract the code object corresponds to the
archs on-device, and ClangOffloadBundler replies with an empty
object if the requested archs are not found in the COOB.
To fix this, I've written 2 patches [2] [3]. Patch 1 adds a
`GetBundleIDsInFile` static method to the `OffloadBundler` API in
Clang. (There is already a method `ListBundleIDsInFile` that
simply prints the available bundle IDs instead of collecting them
in a `std::set`, which is a little confusing why the authors
didn't write `GetBundleIDsInFile` in the first place.) Patch 2
utilizes `GetBundleIDsInFile`
to query the available archs from the CCOB **before** extracting
it, allowing us to choose compatible (if not equivalent) archs to
extract. And of course, you still need the ISA coercion patch [4]
originally authored by Cory to make everything work together.
I'm planning to upstream patch 1 to LLVM because I think
`GetBundleIDsInFile` is a useful API to have.
A workaround posted in [1] is to turn on
`HIP_USE_RUNTIME_UNBUNDLER` so the unbundling process goes through
clr instead of Comgr. However, as of ROCm v6.2.4, trying to load a
CCOB using clr causes SIGSEV. Judging from comments in clr, it
seems like all code object loading functionality will eventually
be moved to Comgr anyway, so I think it would be better to just
patch Comgr instead of relying entirely on clr.
While this issue could be avoided in the first place by turning
off `--offload-compress`, at Solus we decided that the benefit we
get from turning it on is worth it. Most importantly, we can get
close to a 3x binary size reduction [5] and the reduced
possibility of getting a `relocation R_X86_64_PC32 out of range`
error due to binaries getting too large. The latter error usually
takes days to debug when occurred.
I hope this will be helpful when Debian (and other distros) decide
to upgrade to ROCm v6.2+ and want to retain the extended ISA
compatibility, which is a big reason why users may prefer using
distro-provided ROCm in the first place.
Sincerely,
Gavin Zhao
[1]: https://lists.debian.org/debian-ai/2024/02/msg00164.html
[2]:
https://github.com/GZGavinZhao/rocm-llvm-project/commit/6d296f879b0fed830c54b2a9d26240da86c8bb3a
[3]:
https://github.com/GZGavinZhao/rocm-llvm-project/commit/2d8c459a4d4c0567a7a275b4b54560d88e5c6919
[4]:
https://github.com/GZGavinZhao/rocm-llvm-project/commit/a439e4f37ce71de48d4a979594276e3be0e6278f
[5]: https://github.com/NixOS/nixpkgs/pull/305920
Reply to: