[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: LLVM offload compress



Hi Xuanteng,

I don't have all the answers yet, but I'll take a shot at answering regardless. It's possible I'm mistaken about some of the details.

On 2024-09-19 09:44, Xuanteng Huang wrote:
What I’m curious about is the moment the decompression happens, and
the overhead to the end-to-end latency. Does it mean that the compressed
GPU kernels should be decompressed first before their launch to GPU?

It's my understanding that it's the offload bundle that is compressed. The offload bundle is unbundled by libcomgr (in ROCm 5.7 and earlier) or libamdhip64 (in ROCm 6.0 and later). I would therefore reason that the code is decompressed before it is sent to the GPU.

I'm actually not sure when the upload of the kernels to the GPU is done. I suppose it must be either at shared library load time or handled lazily when a kernel is launched. I suspect the latter... ... ... ...and after a bit of research, I think I've more or less confirmed it is the latter. It seems that there was a move to lazily uploading kernels back in 2019 [1].

Each translation unit gets its own offload bundle (by default), so my guess is that the first time that a translation unit launches a kernel, the bundle for that translation unit is decompressed and all the compatible code from the bundle is uploaded to the GPU.

Does it happen every time or for the first time?

I believe it would need to perform the decompression once per program execution.

Or the decompression happens at the time when the package gets installed?

No.

Sincerely,
Cory Bloor

[1]: https://github.com/ROCm/HIP/issues/1304#issuecomment-519691962


Reply to: