(RESOLVED?) Re: ci.rocm.debian.net: gfx1100, gfx1101 broken with (possibly) 6.12.11
On 2025-02-07 10:50, Christian Kastner wrote:
> On 2025-02-05 20:59, Christian Kastner wrote:
>> I just noticed that gfx1100, gfx1101 started failing/tmpfailing
>> recently
> It was the host. Downgrading the kernel to the previous 6.11.10 from
> bookworm-backports resolved the issue.
>
> I have reverted endeavour back to the bookworm's 6.1 for now.
>
> I can't dedicate any time to this, so for now I'm hoping that this is
> some general kvm/vfio issue in 6.12 that will be independently
> discovered and resolved, given enough eyeballs.
>
> Theoretically this could also be a QEMU<->kernel interplay and there is
> a newer QEMU in bookworm-backports, but I'll keep further experiments to
> some other time.
Huh. QEMU upstream pinged me with a potential fix in 6.13 so I wanted to
give it a try.
I first tried to reproduce the issue with 6.12.38 from
bookworm-backports... but the issue is now gone. gfx1100, gfx1101,
gfx1102 worked fine again in QEMU.
Even the AGP issue that we hit last September [3] seems to be resolved,
the workaround is no longer needed.
This could be: newer kernel, newer firmware, newer QEMU, or some
combination thereof (I have all of those from bookworm-backports).
I'll let gfx1100 run in QEMU for a while, and if everything is OK, then
gfx1101 and then gfx1102.
Incidentally: endeavour needs a new home. If you're within Europe [4]
and have space for a 4U host with a theoretical peak power draw of ~900W
and are willing to host it, please reach out.
Best,
Christian
[3]: https://lists.debian.org/debian-ai/2024/09/msg00084.html
[4]: Europe because otherwise import duties will be very high.
Reply to: