ROCm 4.2.0 Update
Hello,
I've been doing some poking into various aspects of the ROCm stack and I
figured I'd share a bit more about what I've learned.
First, I have an answer to Norbert's question about the progress on
upstreaming into LLVM. It seems that the core features required for
building ROCm are now all in LLVM trunk. The main features specific to
the AMD fork are openmp offloading, __hip_atomic builtin functions, the
-parallel-jobs flag, and the option to delegate to another compiler for
further CPU optimizations. Most new development is upstreamed quickly.
Second, I wanted to mention that support for gfx1010 (e.g. RX 5700 XT)
and gfx1030 (e.g. RX 6800 XT) has been merged to the public rocBLAS
master branch. These architectures are not officially supported by ROCm
and there is still an enormous amount of testing to be done, but I can
at least report that both rocBLAS and rocSOLVER pass their respective
unit test suites on the RX 5700 XT when built for gfx1010 with the
official rocm-4.2.0 toolchain. I'd heard that finding supported hardware
was a major hindrance to the Debian packaging effort, so I wanted to
give you a heads-up that this is now possible. I should stress again
that these architectures are not yet officially supported, but I would
be happy to triage any problems you encounter in the linear algebra stack.
If you're unfamiliar with the mapping of GPU architectures to product
names, the LLVM AMDGPU User Guide is quite handy:
https://llvm.org/docs/AMDGPUUsage.html
In any case, that's all I have for now.
Sincerely,
Cory Bloor
Reply to: