Re: Testing HIP with the amdgpu driver
Good news everyone,
On 2022-05-23 16:08, Cordell Bloor wrote:
I suspect that the problem is related to error handling when
overloading the GPU work queues. [...] That said, it's just a guess. I
limited docker to a single core when running the tests and it didn't
prevent this error.
Overwhelming the GPU does seem to be the cause of most of the problems I
experienced. Obviously, failing a stress test isn't a good thing.
However, I'm glad to report that the actual functional test failures are
not as bad as they first appeared.
I'd updated my kernel to that of Debian Sid, so I can't rule out the
newer kernel being part of my improved results. However, I experienced
the exact same graphics problems on the newer kernel when I used gbp
buildpackage and let dh_autotest run the tests. It wasn't until I
started using ctest directly that I got normal behaviour.
Kernel:
Linux scorbunny 5.17.0-2-amd64 #1 SMP PREEMPT Debian 5.17.6-1
(2022-05-14) x86_64 GNU/Linux
Command:
# ctest --output-on-failure -E
"(hipMultiThreadDevice-pyramid|hipMemoryAllocateCoherentDriver|hipMultiProcIpcMem|hipIpcMemAccessTest|hipStreamSync2)"
Results:
99% tests passed, 3 tests failed out of 403
Total Test time (real) = 490.94 sec
The following tests did not run:
96 - directed_tests/g++/hipMalloc_cxx_amd.tst (Skipped)
The following tests FAILED:
99 - directed_tests/hiprtc/hiprtcGetLoweredName.tst (SEGFAULT)
100 - directed_tests/hiprtc/saxpy.tst (SEGFAULT)
361 -
directed_tests/runtimeApi/multiThread/hipMultiThreadStreams2.tst
(Subprocess aborted)
Errors while running CTest
Note that I excluded 5 tests in that ctest command, so it's really more
like 400/408 tests passed. Still, the built-in amdgpu driver is a lot
closer to par with the amdgpu-dkms module than it first appeared!
I have not yet identified which test causes this log message or which
test breaks my graphical session, but it shouldn't be that hard to
find if I start running tests manually.
I suspect it's hipStreamSync2 doing that, because it fails particularly
hard. However, none of the tests break my graphical session when I run
them one at a time.
Sincerely,
Cory Bloor
Reply to: