[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RFS: rocthrust/5.3.3-4~exp1 -- ROCm parallel algorithms library - tests



Hello Étienne,

On 2023-07-09 14:43, Étienne Mollier wrote:
Hi Cory,

Cordell Bloor, on 2023-07-07:
I've added a librocthrust-tests package. This is quite similar to
librocprim-tests.
Hmn, I have no luck with this one.  The package built fine,
including the build time checks, given that I exposed the gpu.
But when I ran the autopkgtest suite, one of the tests caused a
gpu reset.

Au contraire, that is a great success. It is my understanding that it should not be possible for a normal program to cause a GPU reset. This is therefore not a bug in rocthrust, but rather an indication of a problem in some other component of the test system. It could be a hardware problem or a software problem. One possibility would be a bug in the amdgpu driver.

This is exactly the sort of thing that the autopkgtests exist to catch. I'm hoping that once we get this CI system enabled, we will be able to file some high-quality bug reports against the Linux kernel.

   Excerpt from dmesg:

	[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=8914355, emitted seq=8914357
	[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 625332 thread Xorg:cs0 pid 625333

I guess I probably should retry outside graphical context, to
avoid interferring with the test suite.  It might be helpful to
double check how things go on another card than RX 6800, or
another similar model.  Could someone check?

I haven't reproduced your exact setup, but the build tests all passed on my Radeon VII.

Sincerely,
Cory Bloor


Reply to: