Hi Cory, Cordell Bloor, on 2023-07-10: > On 2023-07-09 14:43, Étienne Mollier wrote: > > Cordell Bloor, on 2023-07-07: > > > I've added a librocthrust-tests package. This is quite similar to > > > librocprim-tests. > > Hmn, I have no luck with this one. The package built fine, > > including the build time checks, given that I exposed the gpu. > > But when I ran the autopkgtest suite, one of the tests caused a > > gpu reset. > > Au contraire, that is a great success. It is my understanding that it should > not be possible for a normal program to cause a GPU reset. This is therefore > not a bug in rocthrust, but rather an indication of a problem in some other > component of the test system. It could be a hardware problem or a software > problem. One possibility would be a bug in the amdgpu driver. > > This is exactly the sort of thing that the autopkgtests exist to catch. I'm > hoping that once we get this CI system enabled, we will be able to file some > high-quality bug reports against the Linux kernel. Good point, I'm wrapping up a bug report our the distribution kernel. It's not sent yet as I'd like to run a few more tests to complete my report. > > Excerpt from dmesg: > > > > [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=8914355, emitted seq=8914357 > > [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 625332 thread Xorg:cs0 pid 625333 > > > > I guess I probably should retry outside graphical context, to > > avoid interferring with the test suite. It might be helpful to > > double check how things go on another card than RX 6800, or > > another similar model. Could someone check? > > I haven't reproduced your exact setup, but the build tests all passed on my > Radeon VII. Thanks for checking! So far, I think I isolated test_thrust_set_difference, as it very much stresses the gpu, but I haven't seen it finish in autopkgtest context yet. Now I'm a bit bugged, because the build tests all passed on my end before I ran the autopkgtest (and timing information suggests all SetDifference related tests lastet a only a couple of seconds), but the autopkgtest proper collided on the Xorg server (at least once but I haven't retried such configuration yet), or ran for dozens of minutes without giving an impression of moving forward. I don't exclude the possibility that an implementation detail of the autopkgtest is interferring with the run for that very test, but I'm not sure what it could be yet. Or there is something else I'm completely missing. Have a nice day, :) -- .''`. Étienne Mollier <emollier@debian.org> : :' : gpg: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da `. `' sent from /dev/tty1, please excuse my verbosity `-
Attachment:
signature.asc
Description: PGP signature