[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: science application + opencl/rocm + AMD GPU?



hi Mo

we have some experience with OpenCL+AMD GPUs, but pretty much
limited to Monte Carlo (MC) photon transport simulations. My lab developed
a few GPU-accelerated photon simulators, both in CUDA and OpenCL,
see http://mcx.space. This work is funded by the NIH.

In the past, we have heavily optimized our codes for different GPUs
from different generations and vendors. If you are interested, please
check out two of our papers on the OpenCL codes (MCX-CL and MMCL)

https://doi.org/10.1117/1.JBO.23.1.010504
http://dx.doi.org/10.1117/1.JBO.24.11.115002

the first one was on a voxel-based MC code - from Fig. 2, you can see
the benchmark speed comparisons across different processors. We used
amdgpu-pro 16.30.3 on a Ubuntu 14.04. Overall, the performance on
the AMD GPU was pretty decent - although I expect to see more (if
you compare with the corresponding CUDA code on NVIDIA GPUs in
the inset).

a more comprehensive benchmark list can be found at
http://mcx.space/wiki/?Speed

the 2nd paper just came out last week, it is a more accurate MC
algorithm but needs more memory operations. From Fig. 3, you can
see AMD GPUs clearly left behind (Vega II, Vega64) compared to either
the NVIDIA cards of the same tier, or the voxel-based MC on the same
card (see Fig. 3b). We had some thoughts on what happened in the
Discussion section. These results were consistent from amdgpu/rocm drivers.

Regarding ROCm, we have been trying to get it to work since early
last year, but the experience was a bit complicated. please see
this thread

https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/issues/43

initially, it did not work, but after several updates, after v1.9, my
code started working, but the speed is several fold slower than
amdgpu-pro (16.xx, 17.xx), then, both newer amdgpu-pro/rocm
landed at the lower end of the speed because amdgpu-pro started
to share compilers with rocm see

https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/issues/43#issuecomment-421805412.

from what we saw in our latest paper, the slow speed from ROCm
+AMD GPU was a result of agressive register allocation - my cl kernel
only needs 69 registers by the NVIDIA ocl compiler, but allocates
over 200 vector registers by the ROCm driver. This greatly limits the
"wavefronts" that can run simultaneously.

Regarding open-source ocl driver, yes, it works, at least for my
application. I recently packaged my simulators for Fedora and
by listing "opencl-filesystem" package as the dependency, my
simulators ran out of box on a completely open-source environment,

https://src.fedoraproject.org/rpms/octave-mcxlab/blob/master/f/octave-mcxlab.spec#_13

however, to get good speed we still ask users to install GPU drivers
to use the vendor-optimized opencl library. A ROCm integrated
platform would be wonderful.

happy to share more if it is relevant.

Qianqian


PS: I recently subscribed to this list with a hope to learn packaging
procedures so I can make these tools available to Debian.
these packages are listed here

https://fedoraproject.org/wiki/User:Fangq#Maintained_packages


On 11/23/19 11:34 AM, Mo Zhou wrote:
Hi science team,

I'd like to request the team for sharing some experience on this topic:
"scientific application + opencl/rocm + AMD GPU".  Any experience will
be very helpful to me in terms of the being-investigated ROCm
integration[1] to Debian.

I'd like to ask you the following questions:

1. how's everything going without the amdgpu-pro[2] driver, especially
    the opencl/rocm programs? Most importantly, can OpenCL work without
    any non-free component?

2. how does the consumer-grade AMD gpu perform in terms of scientific
    computing or other opencl/rocm applications?

3. do you think AMD GPU can be a practical alternative to Nvidia/CUDA
    in at least a few applications?


[1] https://salsa.debian.org/rocm-team
[2] non-free, https://www.amd.com/en/support/gpu-pro-eula



Reply to: