[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

ROCm 6.2 will need Comgr patches for extended ISA compatibility



Hi all,

I'm the maintainer for the ROCm stack for Solus OS, and was previously involved in improvements in ISA compatibility [1]. Solus's ROCm stack includes PyTorch and supports almost all consumer GPUs with architecture gfx9* and gfx1* out-of-the-box, thanks to a set of patches
for extended ISA compatibility mentioned in [1] that originated from Debian (thank you Cory!).

While updating Solus's ROCm stack from v6.0.2 to v6.2.4, I noticed that Compressed Code Object Bundle (CCOB) is being turned on for composable_kernel and MIOpen, causing regression in the extended ISA compatibility. My understanding is that when loading a CCOB, unlike when loading a regular bundle, Comgr doesn't scan through the provided archs (which gives us a chance to say which archs are okay to use despite not being equivalent); instead, it just asks ClangOffloadBundler to extract the code object corresponds to the archs on-device, and ClangOffloadBundler replies with an empty object if the requested archs are not found in the COOB.

To fix this, I've written 2 patches [2] [3]. Patch 1 adds a `GetBundleIDsInFile` static method to the `OffloadBundler` API in Clang. (There is already a method `ListBundleIDsInFile` that simply prints the available bundle IDs instead of collecting them in a `std::set`, which is a little confusing why the authors didn't write `GetBundleIDsInFile` in the first place.) Patch 2 utilizes `
GetBundleIDsInFile` to query the available archs from the CCOB **before** extracting it, allowing us to choose compatible (if not equivalent) archs to extract. And of course, you still need the ISA coercion patch [4] originally authored by Cory to make everything work together.

I'm planning to upstream patch 1 to LLVM because I think `GetBundleIDsInFile` is a useful API to have.

A workaround posted in [1] is to turn on `HIP_USE_RUNTIME_UNBUNDLER` so the unbundling process goes through clr instead of Comgr. However, as of ROCm v6.2.4, trying to load a CCOB using clr causes SIGSEV. Judging from comments in clr, it seems like all code object loading functionality will eventually be moved to Comgr anyway, so I think it would be better to just patch Comgr instead of relying entirely on clr.

While this issue could be avoided in the first place by turning off `--offload-compress`, at Solus we decided that the benefit we get from turning it on is worth it. Most importantly, we can get close to a 3x binary size reduction [5] and the reduced possibility of getting a `relocation R_X86_64_PC32 out of range` error due to binaries getting too large. The latter error usually takes days to debug when occurred.

I hope this will be helpful when Debian (and other distros) decide to upgrade to ROCm v6.2+ and want to retain the extended ISA compatibility, which is a big reason why users may prefer using distro-provided ROCm in the first place.

Sincerely,
Gavin Zhao


[1]: https://lists.debian.org/debian-ai/2024/02/msg00164.html
[2]: https://github.com/GZGavinZhao/rocm-llvm-project/commit/6d296f879b0fed830c54b2a9d26240da86c8bb3a
[3]: https://github.com/GZGavinZhao/rocm-llvm-project/commit/2d8c459a4d4c0567a7a275b4b54560d88e5c6919
[4]: https://github.com/GZGavinZhao/rocm-llvm-project/commit/a439e4f37ce71de48d4a979594276e3be0e6278f
[5]: https://github.com/NixOS/nixpkgs/pull/305920

Reply to: