Re: Upgrading to ROCm 6.1
Hi Jeremy,
On 2024-06-11 00:07, Jeremy Newton wrote:
> I've posted to this list in the past, but I figured that I would
> subscribe.
Great!
> I noticed that Debian is still sitting on 5.7 (with I think a few
> 6.0), so I would like to try to help out to get it fast forwarded to
> 6.1.
For context, we lag probably because we are currently focusing on (1)
packaging stuff that we are still missing, and (2) our infrastructure.
As to (1), our near-term goal is to have a ROCm-enabled pytorch. Our
GSoC contributor Xuanteng has made good progress on MIOpen, and I expect
that we will see an upload soon, so that we can move on to pytorch.
As to (2), our goal is to have CI coverage for everything we do. We have
set up our own infra [4] with modified versions of Debian's official CI
software in which we test all our packages on when their dependencies
change, and test all our dependents when we change a package; all of
this on a number of GPU architectures [5].
> For anyone unaware, I maintain some Fedora ROCm packages. I'd be happy
> to help out with a few components for Debian, as I've worked on them
> closely with Fedora:
> - rocm-device-libs
> - rocm-comgr
> - hipcc
> - hsa-runtime
> - hsakmt
> - rocm smi
> - rocminfo
I'm sure that there are lots of things we could use help with, but I
guess it depends on what you are comfortable with?
If you're unfamiliar with Debian packaging, the initial ramp-up might
take more time than the updating to 6.1 itself. Obviously, I could be
wrong; but s been more than a decade since packaging my last rpm, so I
can't really say how little or how much synergy you could exploit.
On the other hand, package upgrades can be pretty straightforward, but
because ROCm is evolving so fast, it's often difficult for someone like
me -- who only occasionally looks at the actual -- to quickly and
correctly assess the notable changes.
> I was going to see if I could open some merge requests on Salsa or send
> patches, but there was one big change in 6.1 from prior releases that
> I'm not sure how to approach.
>
> The change is that 6,1 will now work against upstream LLVM 17, but the
> sources of rocm-device-libs, rocm-comgr, and hipcc are now merged, see
> the new upstream [1]. The tree contains the whole llvm fork from rocm,
> but only the amd directory has the bits that debian needs if you guys
> are using upstream LLVM. As far as I know ROCm 6.2. will move to LLVM 18
> when it's released.
Yeah, that will require some thought.
There's also an ABI breakage [6] to deal with, though the process for
that is well established.
(Side note: Cory raised this on the list not too long ago [6].)
> I'm not very familiar with how Debian tends to work, so I can't really
> suggest a good path forward, but I'm happy to help out whatever way I can.
That depends on how deep you want to get into Debian packaging. If not
too deep, then there's still enough upstream stuff to consider where we
would appreciate any help.
Just one example: regarding MIOpen, we are currently discussing the
pre-compiled kernels as well as finddb stuff that are shipped with it.
Pre-compiled stuff is problematic under the DFSG [8], so we were
discussing (1) if/how we can re-compile these kernel ourselves, (2)
whether the finddb stuff counts as pre-compiled, and (3) how to adapt
the package accordingly.
You did not list MIOpen amongs the packages above, but I only picked to
illustrate that the time-consuming parts do not necessarily have much to
do with packaging. It might even be that there's a super-easy
straightforward answer to our challenges. But to the outsiders here,
like myself, finding simpler answers often takes lots and lots of
digging.
Incidentally: when working on MIOpen, Xuanteng recently mentioned how
some of the challenges that we were facing had already been reported by
Arch, Gentoo and Fedora packagers. I opined that some form of
cross-distro exchange (list or forum) could be useful, so that we can
help and/or coordinate with each other. I suspect that some wheels are
being re-invented here.
I myself would certainly be curious to know what the Fedora world of
ROCm looks like, and what we could learn from that: how are things
packaged, tested, and so on.
Thanks for offering to help!
Best,
Christian
> [1] https://github.com/ROCm/llvm-project/tree/rocm-6.1.x/amd <https://github.com/ROCm/llvm-project/tree/rocm-6.1.x/amd>
> [2] https://github.com/ROCm/llvm-project/commit/96b2ba31ded4a892390dfba3767c413bd1a3a29d <https://github.com/ROCm/llvm-project/commit/96b2ba31ded4a892390dfba3767c413bd1a3a29d>
> [3] https://src.fedoraproject.org/rpms/rocm-compilersupport/blob/rawhide/f/rocm-compilersupport.spec <https://src.fedoraproject.org/rpms/rocm-compilersupport/blob/rawhide/f/rocm-compilersupport.spec>
[4]: https://ci.rocm.debian.net/
[5]:
https://salsa.debian.org/rocm-team/rocm-team-infra/-/blob/master/host_vars/ci.rocm.debian.net/public.yml?ref_type=heads#L63-75
[6]:
https://salsa.debian.org/rocm-team/community/team-project/-/wikis/ROCm-5.7-Release-Plan#obstacles
[7]: https://lists.debian.org/debian-ai/2024/03/msg00004.html
[8]: https://wiki.debian.org/DebianFreeSoftwareGuidelines
Reply to: