Greetings,
I am using Debian to do HPC with AMD GPUs (Radeon model).
The system setup is as follows:
- Debian Testing distribution
- firmware-amd-graphics package
- AMD GPU proprietary driver
- Clang and LLVM packages
When I use the GPUs to do computation, I get random errors like the following:
amdgpu_job_timedout .... sdma0 ring ...
amdgpu_job_timedout .... sdma1 ring ...
I have set up the following parameters in amdgpu.conf
pcie_gen2=0 audio=0 exp_hw_support=1
still I am getting random errors, but the hardware is in good shape.
This error is present in 2 separate hardware systems.
Thanks for any possible help.
Valerio Bellizzomi