[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ROCm Package Testing Results



Hi Christian,

On 7/6/23 02:02, Christian Kastner wrote:
That's a truly impressive amount of coverage! If you have any tips or
suggestions for how to run that much, do share some time (here, or on
the wiki). For example, I assume you swap out cards? Or do they each
have their own host?

I swap out cards. I built a computer for this purpose on an Open Benchtable v2 [3][4][5] instead of using a standard case. It has no power button, so I use a flat-pack wrench to bridge the pins for the power switch, which is fun.

I wrote a script to run all the tests back-to-back. It's included in the gist. The machine is set up in my kitchen [6] so whenever I noticed that a card had completed the test suite, I'd shut the system off, swap in a new card, boot it back up, and start the test suite again. I use the integrated GPU for display, so there's fewer cables to swap.

I thought I might need a fan to cool the MI25 (since it would normally depend on case fans for air flow). However, nearly all the ROCm tests are heavily CPU-bound. The fans rarely spin on the cards that have fans. The box fan I set up was probably unnecessary.

There should be no difference test-wise between the
RX 6800 XT and you're W6800, right?

The RX 6800 XT and the W6800 are both Navi 21, but the W6800 has twice as much VRAM (16 GiB vs 32 GiB). The test suites for ROCm libraries usually just GTEST_SKIP() in cases when the device does not have enough memory for the test they want to run, so you might be able to run a few more tests with the W6800 than with the RX 6800 XT. Aside from that, I would not expect to see any differences in the test logs.

I've only briefly glanced at the output, but it's clear that gfx803 and
gfx1036 are currently somewhat broken. I think the rest look pretty
good. There's definitely some sort of breakage in the hipBLAS test suite
(possibly due to using a different reference BLAS than upstream).

That indeed seems to be the case with hipblas. After swapping the reference library from OpenBLAS to Netlib BLAS/LAPACK, I'm seeing all tests passing on gfx906.

Sincerely,
Cory Bloor

[3]: https://openbenchtable.com/
[4]: https://slerp.xyz/img/misc/testbench-closeup-1.jpg
[5]: https://slerp.xyz/img/misc/testbench-closeup-2.jpg
[6]: https://slerp.xyz/img/misc/testbench.jpg



Reply to: