[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

ROCm 4.2.0 Update


I've been doing some poking into various aspects of the ROCm stack and I figured I'd share a bit more about what I've learned.

First, I have an answer to Norbert's question about the progress on upstreaming into LLVM. It seems that the core features required for building ROCm are now all in LLVM trunk. The main features specific to the AMD fork are openmp offloading, __hip_atomic builtin functions, the -parallel-jobs flag, and the option to delegate to another compiler for further CPU optimizations. Most new development is upstreamed quickly.

Second, I wanted to mention that support for gfx1010 (e.g. RX 5700 XT) and gfx1030 (e.g. RX 6800 XT) has been merged to the public rocBLAS master branch. These architectures are not officially supported by ROCm and there is still an enormous amount of testing to be done, but I can at least report that both rocBLAS and rocSOLVER pass their respective unit test suites on the RX 5700 XT when built for gfx1010 with the official rocm-4.2.0 toolchain. I'd heard that finding supported hardware was a major hindrance to the Debian packaging effort, so I wanted to give you a heads-up that this is now possible. I should stress again that these architectures are not yet officially supported, but I would be happy to triage any problems you encounter in the linear algebra stack.

If you're unfamiliar with the mapping of GPU architectures to product names, the LLVM AMDGPU User Guide is quite handy: https://llvm.org/docs/AMDGPUUsage.html

In any case, that's all I have for now.

Cory Bloor

Reply to: