I experienced this same bug, and in my troubleshooting worked around it with the following steps:
1. Added vfio_pci.ids=[pair of Radeon 6800 ids here]
to my cmdline
2. Added
softdep drm pre: vfio-pci
to new file /etc/modprobe.d/vfio.conf
3. Regenerate initramfs
Making vfio-pci load before amdgpu at boot, so that it can effectively get bound to the GPU instead of amdgpu as specified by the cmdline argument
Since the bug occurs when the host attempts to reclaim the GPU from a VM with the amdgpu driver, adjusting it so that the host never loads amdgpu on it and always has the card on vfio-pci prevents all hanging and errors. Allowing for it to be freely booted
with another VMs after the VM using it shuts down
This works as a solution on my setup where the host uses the CPU's iGPU for its display output, and uses the discrete GPU exclusively for passthrough to VMs. But this wouldn't work on a system where there's only one GPU and single GPU passthrough is being done
|