[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Towards testing all Navi 3x architectures



Hi all,

On 2024-08-12 07:40, Cordell Bloor wrote:
> What models of Navi 3x GPUs can we get working with passthrough?

Host 'endeavour' is now testing gfx1100 in QEMU with passthrough in
production, and I expect gfx1101 and gfx1102 to follow up, soon.


Navi 3x is affected by a bug [1]. The good news is that for now, we seem
to be able to work around it by re-enabling AGP within the guest.

On ci-test, I've already run the full battery of tests on gfx1100 with
no issues. For gfx1101 and gfx1102, hand-picked tests also went well; I
have now scheduled the full battery for gfx1101, and will follow up with
gfx1102. If ok, they will join gfx1100 on endeavour [2].

As to the workaround, I initially thought that it would require image
build customization, which I disliked. Luckily, I found a run-time
solution: in the test setup phase, I modify the kernel command line in
GRUB configuration and let autopkgtest reboot the VM before continuing
on to the test.

This workaround also worked for Ubuntu 24.04.

Bad news #1: The feeling is that AGP itself is irrelevant. Rather,
enabling it has some side effect that hides or fixes things. The root
cause has thus not been identified yet and could resurface any time.

Bad news #2: It did *not* work with the upstream amdgpu-dkms. If this
module precedes changes to amdgpu mainline, then we might soon see a
change in mainline that breaks our workaround.

Despite these last two points, the results of my tests gave me enough
confidence to implement this in production.

Best,
Christian

[1]: https://gitlab.freedesktop.org/drm/amd/-/issues/3644
[2]: Host endeavour was formerly known as 'explorer'. The latter was a
     bit of a temporary setup, for getting things going with Navi 3x. It
     received a fresh install and a new name.


Reply to: