[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#1056667: librocthrust-tests: test failures across all architectures



Hi Christian,

On 2023-11-24 08:53, Christian Kastner wrote:
On gfx803 and gfx906, the tests pass, but do so in under one minute,
with no test output. That cannot be right.

This behaviour is likely caused by the GPU only being partially available in the CI environment. We do not yet have working pcie passthrough for gfx803 or gfx906, so /dev/kfd exists but the driver fails to initialize the GPU and there are therefore no devices in rocminfo. The rocthrust tests don't seem to handle this error condition appropriately.

On all other architectures, about two minutes into the test, it seems an
infinite loop is encountered. The test eventually hits the timeout limit
and is then terminated by autopkgtest.

We originally saw this with gfx1030, where the test caused a driver crash. I agree that testing on multiple hardware architectures seems to indicate that there was also a rocthrust bug. I suppose we should see if this problem persists after updating rocprim and rocthrust to ROCm 5.7.1.

Sincerely,
Cory Bloor


Reply to: