[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

(RESOLVED?) Re: ci.rocm.debian.net: gfx1100, gfx1101 broken with (possibly) 6.12.11



On 2025-02-07 10:50, Christian Kastner wrote:
> On 2025-02-05 20:59, Christian Kastner wrote:
>> I just noticed that gfx1100, gfx1101 started failing/tmpfailing
>> recently

> It was the host. Downgrading the kernel to the previous 6.11.10 from
> bookworm-backports resolved the issue.
> 
> I have reverted endeavour back to the bookworm's 6.1 for now.
> 
> I can't dedicate any time to this, so for now I'm hoping that this is
> some general kvm/vfio issue in 6.12 that will be independently
> discovered and resolved, given enough eyeballs.
> 
> Theoretically this could also be a QEMU<->kernel interplay and there is
> a newer QEMU in bookworm-backports, but I'll keep further experiments to
> some other time.

Huh. QEMU upstream pinged me with a potential fix in 6.13 so I wanted to
give it a try.

I first tried to reproduce the issue with 6.12.38 from
bookworm-backports... but the issue is now gone. gfx1100, gfx1101,
gfx1102 worked fine again in QEMU.

Even the AGP issue that we hit last September [3] seems to be resolved,
the workaround is no longer needed.

This could be: newer kernel, newer firmware, newer QEMU, or some
combination thereof (I have all of those from bookworm-backports).

I'll let gfx1100 run in QEMU for a while, and if everything is OK, then
gfx1101 and then gfx1102.

Incidentally: endeavour needs a new home. If you're within Europe [4]
and have space for a 4U host with a theoretical peak power draw of ~900W
and are willing to host it, please reach out.

Best,
Christian

[3]: https://lists.debian.org/debian-ai/2024/09/msg00084.html
[4]: Europe because otherwise import duties will be very high.


Reply to: