Re: Towards testing all Navi 3x architectures
On 2024-09-21 14:46, Christian Kastner wrote:
> On 2024-08-12 07:40, Cordell Bloor wrote:
> Host 'endeavour' is now testing gfx1100 in QEMU with passthrough in
> production, and I expect gfx1101 and gfx1102 to follow up, soon.
Endeavour has been testing gfx1100 and gfx1101 in prod since Sunday.
There was one unsettling issue: on Monday midday, I noticed that tests
for these two architectures stopped updating on ci.rocm.debian.net, and
when I came home from work, the GPU fans running at max speed.
The host remained responsive and had nothing suspicious in dmesg.
Foolishly, I quickly shut the host off, instead of investigating the guests.
It can't have been load because tests on both GPUs resumed on next boot,
and the issue did not occur again. I don't think airflow was an issue,
either; there are 3x 2000rpm case fans and the W7800 and W7700 are
blower cards. The case fans admittedly pass through an AiO radiator but
given our loads, I cannot image this to be a factor.
gfx1102 also completed all its tests on ci-test.rocm.debian.net, but
because of the issue above, and the W7500 being only passively cooled,
I'm not going to move it to endeavour just yet.
Best,
Christian
Reply to: