[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ROCm Enabled gloo



Hi Spaarsh,

On Sunday, June 29th, 2025 at 12:57 AM, Spaarsh Thakkar <spaarshthakkar11010@gmail.com> wrote:
I spent a little time over the package and have made some progress. The package already has rules[3] and control files for building the package with ROCm (they also have the same for CUDA) but they had been kept in a separate file named control.rocm[4]. I moved the package names from the control.rocm file to the main control file and the package now builds with libgloo-rocm* binaries. They already have the corresponding *.install[5][6] files in place too. I have not had the opportunity to test this new package though.

I believe "d/control.rocm" and "d/control" are intended to remain separate. Both files list packages that have libraries with identical names and merging them may cause unintended overwrites during the build process. I learned this while building for Kokkos. There must be a script named "rocmbuild.sh", which is to be executed before building for ROCm. This script replaces the contents of the main control, copyright files with those of control.rocm, copyright.rocm and so on. Hope this helps!

Sincerely,
Utkarsh Raj
On Sunday, June 29th, 2025 at 12:57 AM, Spaarsh Thakkar <spaarshthakkar11010@gmail.com> wrote:
Greetings to the community!

As part of my GSoC'25[1] work under the mentorship of Cordell Bloor (cc'd), I plan to enable ROCm for gloo[2]. I would like to know if anyone else is also working on the same. If that is the case, then I hope that the following information is useful.

I spent a little time over the package and have made some progress. The package already has rules[3] and control files for building the package with ROCm (they also have the same for CUDA) but they had been kept in a separate file named control.rocm[4]. I moved the package names from the control.rocm file to the main control file and the package now builds with libgloo-rocm* binaries. They already have the corresponding *.install[5][6] files in place too. I have not had the opportunity to test this new package though.

It must be noted that the two dependencies that ROCm enabled gloo needs are hipcc and librccl-dev[7] [8]. The latter is only on the unstable and trixie (testing) branches[9] right now (which explains why this wasn't done earlier despite having the necessary rules and control files in place).

I have already made the changes and pushed them to my gloo fork[10] but I haven't made an MR yet.

Regards,
Spaarsh Thakkar

[1]: https://lists.debian.org/debian-ai/2025/05/msg00042.html
[2]: https://salsa.debian.org/deeplearning-team/gloo
[3]: https://salsa.debian.org/deeplearning-team/gloo/-/blob/master/debian/rules?ref_type=heads#L33-49
[4]: https://salsa.debian.org/deeplearning-team/gloo/-/blob/master/debian/control.rocm
[5]: https://salsa.debian.org/deeplearning-team/gloo/-/blob/master/debian/libgloo-rocm-0.install
[6]: https://salsa.debian.org/deeplearning-team/gloo/-/blob/master/debian/libgloo-rocm-dev.install
[7]: https://packages.debian.org/sid/librccl-dev
[8]: https://packages.debian.org/trixie/librccl-dev
[9]: https://packages.debian.org/search?keywords=librccl-dev
[10]: https://salsa.debian.org/Spaarsh/gloo


Reply to: