Bug#1114912: linux-image-amd64: KVM GPU passthrough causes kernel crash and system hang on Debian 13 after VM shutdown
On Tue, 21 Oct 2025 17:07:00 +0200 Uwe =?utf-8?Q?Kleine-K=C3=B6nig?=
<u.kleine-koenig@baylibre.com> wrote:
> Hello,
>
> On Wed, Oct 08, 2025 at 04:07:51PM +0200, Uwe Kleine-König wrote:
> > On Wed, Sep 24, 2025 at 04:33:33PM +0000, dec first wrote:
> > > Please let me know if any additional information or further
testing is required.
> >
> > I think what happens when you start kvm (or whatever virtual machine
> > manager you're using) is that the amdgpu driver is unbound and after
> > Windows shut down the driver is bound again to the hardware.
> >
> > I guess unbinding fails to release all resources. Can you try without
> > the virtual windows to just do:
> >
> > echo 0000:03:00.0 > /sys/bus/pci/drivers/amdgpu/unbind
> > echo 0000:03:00.0 > /sys/bus/pci/drivers/amdgpu/bind
> >
> > Does this result in error messages, or a working driver?
>
> To get to a better solution than hacking the vfio_pci driver to bind to
> the graphics card before amdgpu does, answering the above question would
> be helpful.
>
> Best regards
> Uwe
Hi Uwe,
Sorry, I was busy this month. I read the email and wrote drafts but
forgot to send it.
In the attachment of my previous email, there are detailed records, but
the content might be somewhat messy. So I will pick out the parts you
need. Also please forgive me for not being able to run new tests, as my
computer is set up and running applications.
Below are the logs of binding and unbinding amdgpu. (No GPU blacklist,
no VM running, and no GPU passthrough):
- Unbinding GPU:
```
echo "0000:03:00.0" > /sys/bus/pci/drivers/amdgpu/unbind
```
```
[Tue Sep 23 20:37:52 2025] amdgpu 0000:03:00.0: amdgpu: amdgpu:
finishing device.
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: failed to clear
page tables on GEM object close (-19)
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: leaking bo va (-19)
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: failed to clear
page tables on GEM object close (-19)
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: leaking bo va (-19)
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: failed to clear
page tables on GEM object close (-19)
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: leaking bo va (-19)
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: failed to clear
page tables on GEM object close (-19)
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: leaking bo va (-19)
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: failed to clear
page tables on GEM object close (-19)
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: leaking bo va (-19)
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: failed to clear
page tables on GEM object close (-19)
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: leaking bo va (-19)
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: failed to clear
page tables on GEM object close (-19)
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: leaking bo va (-19)
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: failed to clear
page tables on GEM object close (-19)
[Tue Sep 23 20:37:52 2025] pci 0000:03:00.0: amdgpu: leaking bo va (-19)
[Tue Sep 23 20:37:52 2025] [drm] amdgpu: ttm finalized
[Tue Sep 23 20:37:52 2025] vga_switcheroo: disabled
```
- Bind back the GPU after unbinded.
```
echo "0000:03:00.0" > /sys/bus/pci/drivers/amdgpu/bind
```
```
[Tue Sep 23 20:39:50 2025] [drm] initializing kernel modesetting (IP
DISCOVERY 0x1002:0x744C 0x1EAE:0x7901 0xC8).
[Tue Sep 23 20:39:50 2025] [drm] register mmio base: 0x80000000
[Tue Sep 23 20:39:50 2025] [drm] register mmio size: 1048576
[Tue Sep 23 20:39:58 2025] [drm] add ip block number 0 <soc21_common>
[Tue Sep 23 20:39:58 2025] [drm] add ip block number 1 <gmc_v11_0>
[Tue Sep 23 20:39:58 2025] [drm] add ip block number 2 <ih_v6_0>
[Tue Sep 23 20:39:58 2025] [drm] add ip block number 3 <psp>
[Tue Sep 23 20:39:58 2025] [drm] add ip block number 4 <smu>
[Tue Sep 23 20:39:58 2025] [drm] add ip block number 5 <dm>
[Tue Sep 23 20:39:58 2025] [drm] add ip block number 6 <gfx_v11_0>
[Tue Sep 23 20:39:58 2025] [drm] add ip block number 7 <sdma_v6_0>
[Tue Sep 23 20:39:58 2025] [drm] add ip block number 8 <vcn_v4_0>
[Tue Sep 23 20:39:58 2025] [drm] add ip block number 9 <jpeg_v4_0>
[Tue Sep 23 20:39:58 2025] [drm] add ip block number 10 <mes_v11_0>
[Tue Sep 23 20:39:58 2025] [drm] BIOS signature incorrect 0 0
[Tue Sep 23 20:39:58 2025] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS
from VFCT
[Tue Sep 23 20:39:58 2025] amdgpu: ATOM BIOS: 113-31XFSHBS1.L04
[Tue Sep 23 20:39:58 2025] amdgpu 0000:03:00.0: amdgpu: CP RS64 enable
[Tue Sep 23 20:39:58 2025] amdgpu 0000:03:00.0: amdgpu: Trusted Memory
Zone (TMZ) feature not supported
[Tue Sep 23 20:39:58 2025] amdgpu 0000:03:00.0: amdgpu: MODE1 reset
[Tue Sep 23 20:39:58 2025] amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
[Tue Sep 23 20:39:58 2025] amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
[Tue Sep 23 20:39:59 2025] amdgpu 0000:03:00.0: amdgpu: MEM ECC is not
presented.
[Tue Sep 23 20:39:59 2025] amdgpu 0000:03:00.0: amdgpu: SRAM ECC is not
presented.
[Tue Sep 23 20:39:59 2025] [drm] vm size is 262144 GB, 4 levels, block
size is 9-bit, fragment size is 9-bit
[Tue Sep 23 20:39:59 2025] amdgpu 0000:03:00.0: amdgpu: VRAM: 24560M
0x0000008000000000 - 0x00000085FEFFFFFF (24560M used)
[Tue Sep 23 20:39:59 2025] amdgpu 0000:03:00.0: amdgpu: GART: 512M
0x00007FFF00000000 - 0x00007FFF1FFFFFFF
[Tue Sep 23 20:39:59 2025] [drm] Detected VRAM RAM=24560M, BAR=32768M
[Tue Sep 23 20:39:59 2025] [drm] RAM width 384bits GDDR6
[Tue Sep 23 20:39:59 2025] [drm] amdgpu: 24560M of VRAM memory ready
[Tue Sep 23 20:39:59 2025] [drm] amdgpu: 31784M of GTT memory ready.
[Tue Sep 23 20:39:59 2025] [drm] GART: num cpu pages 131072, num gpu
pages 131072
[Tue Sep 23 20:39:59 2025] [drm] PCIE GART of 512M enabled (table at
0x00000085FEB00000).
[Tue Sep 23 20:39:59 2025] [drm] Loading DMUB firmware via PSP:
version=0x07002D00
[Tue Sep 23 20:39:59 2025] [drm] Found VCN firmware Version ENC: 1.23
DEC: 9 VEP: 0 Revision: 16
[Tue Sep 23 20:39:59 2025] amdgpu 0000:03:00.0: amdgpu: reserve
0x1300000 from 0x85fc000000 for PSP TMR
[Tue Sep 23 20:39:59 2025] amdgpu 0000:03:00.0: amdgpu: RAP: optional
rap ta ucode is not available
[Tue Sep 23 20:39:59 2025] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY:
securedisplay ta ucode is not available
[Tue Sep 23 20:39:59 2025] amdgpu 0000:03:00.0: amdgpu: smu driver if
version = 0x0000003d, smu fw if version = 0x00000040, smu fw program =
0, smu fw version = 0x004e8000 (78.128.0)
[Tue Sep 23 20:39:59 2025] amdgpu 0000:03:00.0: amdgpu: SMU driver if
version not matched
[Tue Sep 23 20:39:59 2025] amdgpu 0000:03:00.0: amdgpu: SMU is
initialized successfully!
[Tue Sep 23 20:39:59 2025] [drm] Display Core v3.2.301 initialized on
DCN 3.2
[Tue Sep 23 20:39:59 2025] [drm] DP-HDMI FRL PCON supported
[Tue Sep 23 20:39:59 2025] [drm] DMUB hardware initialized:
version=0x07002D00
[Tue Sep 23 20:39:59 2025] snd_hda_intel 0000:03:00.1: bound
0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[Tue Sep 23 20:40:00 2025] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[Tue Sep 23 20:40:00 2025] kfd kfd: amdgpu: Total number of KFD nodes to
be created: 1
[Tue Sep 23 20:40:00 2025] amdgpu: Virtual CRAT table created for GPU
[Tue Sep 23 20:40:00 2025] amdgpu: Topology: Add dGPU node [0x744c:0x1002]
[Tue Sep 23 20:40:00 2025] kfd kfd: amdgpu: added device 1002:744c
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: SE 6, SH per SE
2, CU per SH 8, active_cu_number 96
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0
uses VM inv eng 0 on hub 0
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0
uses VM inv eng 1 on hub 0
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0
uses VM inv eng 4 on hub 0
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0
uses VM inv eng 6 on hub 0
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0
uses VM inv eng 7 on hub 0
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1
uses VM inv eng 8 on hub 0
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1
uses VM inv eng 9 on hub 0
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1
uses VM inv eng 10 on hub 0
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1
uses VM inv eng 11 on hub 0
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses
VM inv eng 12 on hub 0
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses
VM inv eng 13 on hub 0
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: ring
vcn_unified_0 uses VM inv eng 0 on hub 8
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: ring
vcn_unified_1 uses VM inv eng 1 on hub 8
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec
uses VM inv eng 4 on hub 8
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: ring
mes_kiq_3.1.0 uses VM inv eng 14 on hub 0
[Tue Sep 23 20:40:00 2025] [drm] ring gfx_32768.1.1 was added
[Tue Sep 23 20:40:00 2025] [drm] ring compute_32768.2.2 was added
[Tue Sep 23 20:40:00 2025] [drm] ring sdma_32768.3.3 was added
[Tue Sep 23 20:40:00 2025] [drm] ring gfx_32768.1.1 ib test pass
[Tue Sep 23 20:40:00 2025] [drm] ring compute_32768.2.2 ib test pass
[Tue Sep 23 20:40:00 2025] [drm] ring sdma_32768.3.3 ib test pass
[Tue Sep 23 20:40:00 2025] vga_switcheroo: enabled
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: amdgpu: Using ATPX for
runtime pm
[Tue Sep 23 20:40:00 2025] [drm] Initialized amdgpu 3.61.0 for
0000:03:00.0 on minor 1
[Tue Sep 23 20:40:00 2025] amdgpu 0000:03:00.0: [drm] fb1: amdgpudrmfb
frame buffer device
```
If you need more information to resolve the issue, I will try my best to
assist :)
Best regards,
Naunte
Reply to: