[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: 5.7 Release Plan / Timeline?



Hi Christian,

On 2023-12-12 15:23, Christian Kastner wrote:
One thing that we shouldn't underestimate is that some upgrades to 5.7
might be complicated.

That's true, but I don't think it's a big of a deal if some of the libraries take a bit longer than others to move to ROCm 5.7. While it's nice to have all the libraries tracking upstream closely, updating the mathlibs to ROCm 5.7 is not my top priority. To me, the most important two things are (a) enabling RDNA 3 support and (b) enabling AMD GPU-accelerated PyTorch.

RDNA 3 support was added in ROCm 5.4, so to complete (a), the main requirement is to get rocm-hipamd updated to ROCm 5.4 or later. For the the HIP compiler and runtime components, we might as well move to ROCm 5.7 and clang-17 because older versions aren't going to be any easier to migrate to. However, that's pretty much all that's needed for us to enable RDNA 3 support.

Most of the ROCm math and primitive libraries (e.g., rocBLAS, rocSPARSE, rocSOLVER, rocPRIM, etc.) are already on ROCm 5.5 or later in Trixie, so once the newer compiler and runtime has been packaged, all that's required is to add gfx1100, gfx1101 and gfx1102 to the AMDGPU_TARGETS in d/rules for each library. Though, I suppose we still need to update rocThrust and rccl since they're on ROCm 5.3.

To complete (b), the main thing needed is to get a few more libraries packaged. I would be surprised if PyTorch was adopting ROCm features so quickly that the mathlibs from ROCm 5.5 were insufficient for the current PyTorch. With that said, I haven't actually tried building it, so i don't know for certain.

To me, updating everything to ROCm 5.7 is a nice-to-have, but it's not an end itself. I'm not worried about mixing libraries from different ROCm releases if some libraries take longer than others. It's not like Debian is exactly matching upstream anyway. Debian is building with a different compiler, running on a different driver, and is carrying quite a few patches. I look to the ROCm test suite as the source of truth on the correctness of the libraries.

For example, rocsparse itself was easy, but
documentation build requires new Sphinx packages.

The (many) Sphinx packages used in the new docs are almost entirely for the theme. I think you could probably apply a patch that restores the conf.py from ROCm 5.5 to build using the vanilla readthedocs theme. You'd probably still need to add sphinx-external-toc package, though. At least, that would probably be my approach. I must admit that I've been procrastinating on some of the ROCm 5.7 updates because I knew the docs would be tricky.

I'm not a huge fan of the way that rocm-docs-core is embedded in the docs build. Fortunately, the maintainers have discussed slowly transitioning it into a normal Sphinx theme and maybe splitting off other functionality into Sphinx extensions. In any case, if you're having trouble packaging the ROCm documentation due to design decisions made by AMD developers, please be sure to provide your feedback upstream as I believe they will be receptive to your needs.

Sincerely,
Cory Bloor


Reply to: