[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: (RESOLVED?) Re: ci.rocm.debian.net: gfx1100, gfx1101 broken with (possibly) 6.12.11





On 8/24/25 6:53 AM, Christian Kastner wrote:
On 2025-02-07 10:50, Christian Kastner wrote:
On 2025-02-05 20:59, Christian Kastner wrote:
I just noticed that gfx1100, gfx1101 started failing/tmpfailing
recently

It was the host. Downgrading the kernel to the previous 6.11.10 from
bookworm-backports resolved the issue.

I have reverted endeavour back to the bookworm's 6.1 for now.

I can't dedicate any time to this, so for now I'm hoping that this is
some general kvm/vfio issue in 6.12 that will be independently
discovered and resolved, given enough eyeballs.

Theoretically this could also be a QEMU<->kernel interplay and there is
a newer QEMU in bookworm-backports, but I'll keep further experiments to
some other time.

Huh. QEMU upstream pinged me with a potential fix in 6.13 so I wanted to
give it a try.

I first tried to reproduce the issue with 6.12.38 from
bookworm-backports... but the issue is now gone. gfx1100, gfx1101,
gfx1102 worked fine again in QEMU.

Even the AGP issue that we hit last September [3] seems to be resolved,
the workaround is no longer needed.

This could be: newer kernel, newer firmware, newer QEMU, or some
combination thereof (I have all of those from bookworm-backports).

I'll let gfx1100 run in QEMU for a while, and if everything is OK, then
gfx1101 and then gfx1102.

Incidentally: endeavour needs a new home. If you're within Europe [4]
and have space for a 4U host with a theoretical peak power draw of ~900W
and are willing to host it, please reach out.

Best,
Christian

[3]: https://lists.debian.org/debian-ai/2024/09/msg00084.html
[4]: Europe because otherwise import duties will be very high.


I'm curious what's the specific commit they suggested from kernel 6.13 would help (if known)?


Reply to: