[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1114912: linux-image-amd64: KVM GPU passthrough causes kernel crash and system hang on Debian 13 after VM shutdown



Package: linux-image-amd64
Version: 6.16.5-1
Severity: important
X-Debbugs-Cc: dec01-2021@hotmail.com

Dear Maintainer,

I configured AMD RX-7900XTX GPU passthrough to a KVM virtual machine running Windows 10 on Debian 13.1.0. 

After shutting down the Windows 10 VM, the Linux host kernel reports errors related to vfio_pci and AMD GPU passthrough.

This causes virt-manager to freeze and become unresponsive; virsh commands also hang. The GPU fails to reset, and rebooting the host system often hangs, requiring a forced reboot via hardware reset or power off.

Previously, on Debian 12.12.0 with the same hardware and GPU passthrough setup, no such issues occurred; the GPU reset properly every time.

I expected the VM shutdown to complete cleanly without freezing host control tools or causing kernel crashes, and for the GPU to reset so it could be reused without reboot.

I have also tested kernels from Debian testing (linux-image-6.16.3+deb14-amd6) and unstable (linux-image-6.16.5+deb14-amd64) branches, but the same errors occur.

-- PC Setup:
GPU: AMD RX 7900XTX
Motherboard: Asus TUF GAMING Z890-PLUS WIFI
CPU: Intel Ultra 7 265K
Host OS: Debian 13.1 with kernel 6.12.43
Virtualization: KVM/QEMU

-- System Information:
Debian Release: 13.1
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'unstable'), (500, 'testing'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.12.43+deb13-amd64 (SMP w/20 CPU threads; PREEMPT)
Kernel taint flags: TAINT_DIE
Locale: LANG=en_SG.UTF-8, LC_CTYPE=en_SG.UTF-8 (charmap=UTF-8), LANGUAGE=en_SG:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages linux-image-amd64 depends on:
ii  linux-image-6.16.5+deb14-amd64  6.16.5-1

linux-image-amd64 recommends no packages.

linux-image-amd64 suggests no packages.

-- no debconf information



-- PCI devices:
00:00.0 Host bridge [0600]: Intel Corporation Device [8086:7d1b] (rev 01)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
00:02.0 VGA compatible controller [0300]: Intel Corporation Arrow Lake-S [Intel Graphics] [8086:7d67] (rev 06)
	DeviceName: Onboard IGD
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: i915
	Kernel modules: i915, xe
00:04.0 Signal processing controller [1180]: Intel Corporation Device [8086:ad03] (rev 01)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: proc_thermal_pci
	Kernel modules: processor_thermal_device_pci
00:06.0 PCI bridge [0604]: Intel Corporation Device [8086:ae4d] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: pcieport
00:08.0 System peripheral [0880]: Intel Corporation Device [8086:ae4c] (rev 10)
	DeviceName: Intel GNA Device
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
00:0a.0 Signal processing controller [1180]: Intel Corporation Device [8086:ad0d] (rev 01)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: intel_vsec
	Kernel modules: intel_vsec
00:0b.0 Processing accelerators [1200]: Intel Corporation Arrow Lake NPU [8086:ad1d] (rev 01)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: intel_vpu
	Kernel modules: intel_vpu
00:14.0 RAM memory [0500]: Intel Corporation Device [8086:ae7f] (rev 10)
	DeviceName: USB Controller
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: intel_pmc_ssram_telemetry
	Kernel modules: intel_pmc_ssram_telemetry
00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:ae0d] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
00:1f.5 Serial bus controller [0c80]: Intel Corporation Device [8086:ae23] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
01:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 10)
	Kernel driver in use: pcieport
02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (rev 10)
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
	Kernel driver in use: pcieport
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX/7900 GRE/7900M] [1002:744c] (rev c8)
	Subsystem: XFX Limited RX-79XMERCB9 [SPEEDSTER MERC 310 RX 7900 XTX] [1eae:7901]
	Kernel modules: amdgpu
03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel
80:14.0 USB controller [0c03]: Intel Corporation Device [8086:7f6e] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
80:14.5 Non-VGA unclassified device [0000]: Intel Corporation Device [8086:7f2f] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
80:15.0 Serial bus controller [0c80]: Intel Corporation Device [8086:7f4c] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: intel-lpss
	Kernel modules: intel_lpss_pci
80:15.2 Serial bus controller [0c80]: Intel Corporation Device [8086:7f4e] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: intel-lpss
	Kernel modules: intel_lpss_pci
80:15.3 Serial bus controller [0c80]: Intel Corporation Device [8086:7f4f] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: intel-lpss
	Kernel modules: intel_lpss_pci
80:16.0 Communication controller [0780]: Intel Corporation Device [8086:7f68] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: mei_me
	Kernel modules: mei_me
80:17.0 SATA controller [0106]: Intel Corporation Device [8086:7f62] (rev 10)
	DeviceName: SATA Controller
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: ahci
	Kernel modules: ahci
80:1b.0 PCI bridge [0604]: Intel Corporation Device [8086:7f40] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: pcieport
80:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:7f38] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: pcieport
80:1c.2 PCI bridge [0604]: Intel Corporation Device [8086:7f3a] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: pcieport
80:1c.3 PCI bridge [0604]: Intel Corporation Device [8086:7f3b] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: pcieport
80:1d.0 PCI bridge [0604]: Intel Corporation Device [8086:7f30] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: pcieport
80:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:7f04] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
80:1f.3 Audio device [0403]: Intel Corporation Device [8086:7f50] (rev 10)
	DeviceName: Intel HD Audio
	Subsystem: ASUSTeK Computer Inc. Device [1043:886d]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel, snd_sof_pci_intel_mtl
80:1f.4 SMBus [0c05]: Intel Corporation Device [8086:7f23] (rev 10)
	DeviceName: SMBus Controller
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
	Kernel driver in use: i801_smbus
	Kernel modules: i2c_i801
80:1f.5 Serial bus controller [0c80]: Intel Corporation Device [8086:7f24] (rev 10)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88ef]
83:00.0 Network controller [0280]: MEDIATEK Corp. Device [14c3:7925]
	Subsystem: AzureWave Device [1a3b:6000]
	Kernel driver in use: mt7925e
	Kernel modules: mt7925e
84:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I226-V [8086:125c] (rev 06)
	DeviceName: Intel I226-V LAN
	Subsystem: ASUSTeK Computer Inc. Device [1043:8867]
	Kernel driver in use: igc
	Kernel modules: igc


-- Snippets from my KVM config:
  <os>
    <type arch='x86_64' machine='pc-q35-7.2'>hvm</type>
    <boot dev='hd'/>
  </os>

  <features>
    <acpi/>
    <apic/>
    <hyperv mode='custom'>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='1234567890ab'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
    <vmport state='off'/>
  </features>

  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' clusters='1' cores='4' threads='2'/>
    <feature policy='disable' name='hypervisor'/>
  </cpu>

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <rom file='/usr/share/qemu/7900xtx.rom'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </hostdev>

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </hostdev>

-- Logs: dmesg -l err
[   55.221506] vfio-pci 0000:03:00.0: amdgpu: failed to clear page tables on GEM object close (-19)
[   55.221509] vfio-pci 0000:03:00.0: amdgpu: leaking bo va (-19)
[   55.221519] vfio-pci 0000:03:00.0: amdgpu: failed to clear page tables on GEM object close (-19)
[   55.221520] vfio-pci 0000:03:00.0: amdgpu: leaking bo va (-19)
[   55.221522] vfio-pci 0000:03:00.0: amdgpu: failed to clear page tables on GEM object close (-19)
[   55.221522] vfio-pci 0000:03:00.0: amdgpu: leaking bo va (-19)
[   55.221527] vfio-pci 0000:03:00.0: amdgpu: failed to clear page tables on GEM object close (-19)
[   55.221528] vfio-pci 0000:03:00.0: amdgpu: leaking bo va (-19)
[   55.221531] vfio-pci 0000:03:00.0: amdgpu: failed to clear page tables on GEM object close (-19)
[   55.221531] vfio-pci 0000:03:00.0: amdgpu: leaking bo va (-19)
[   55.221533] vfio-pci 0000:03:00.0: amdgpu: failed to clear page tables on GEM object close (-19)
[   55.221533] vfio-pci 0000:03:00.0: amdgpu: leaking bo va (-19)
[   55.221535] vfio-pci 0000:03:00.0: amdgpu: failed to clear page tables on GEM object close (-19)
[   55.221536] vfio-pci 0000:03:00.0: amdgpu: leaking bo va (-19)
[   55.221537] vfio-pci 0000:03:00.0: amdgpu: failed to clear page tables on GEM object close (-19)
[   55.221538] vfio-pci 0000:03:00.0: amdgpu: leaking bo va (-19)


-- Logs: dmesg -l warn
[  123.450266] Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
[  123.450275] CPU: 1 UID: 0 PID: 901 Comm: kworker/1:2 Not tainted 6.12.43+deb13-amd64 #1  Debian 6.12.43-1
[  123.450284] Hardware name: ASUS System Product Name/TUF GAMING Z890-PLUS WIFI, BIOS 2207 07/18/2025
[  123.450288] Workqueue: pm pm_runtime_work
[  123.450301] RIP: 0010:down_write+0x20/0x60
[  123.450311] Code: 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 fb e8 1e bd ff ff 65 ff 05 7f 37 13 6a 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 26 65 48 8b 05 61 37 13 6a 48 89 43 08 65 ff 0d
[  123.450318] RSP: 0018:ffffb6eec0ce3d38 EFLAGS: 00010246
[  123.450325] RAX: 0000000000000000 RBX: 0000000000000520 RCX: 0000000000000017
[  123.450331] RDX: 0000000000000001 RSI: ffff8f660241f0c8 RDI: 0000000000000520
[  123.450335] RBP: 0000000000000520 R08: 0000000000000000 R09: ffff8f6601812ac0
[  123.450339] R10: 0000000000000003 R11: 0000000000000000 R12: ffffffffc0eb5380
[  123.450343] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8f6610de4d40
[  123.450347] FS:  0000000000000000(0000) GS:ffff8f753fc80000(0000) knlGS:0000000000000000
[  123.450352] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  123.450357] CR2: 0000000000000520 CR3: 000000075c622004 CR4: 0000000000f72ef0
[  123.450363] PKRU: 55555554
[  123.450367] Call Trace:
[  123.450373]  <TASK>
[  123.450384]  vfio_pci_core_runtime_suspend+0x1e/0x70 [vfio_pci_core]
[  123.450407]  pci_pm_runtime_suspend+0x67/0x1a0
[  123.450418]  ? __pfx_pci_pm_runtime_suspend+0x10/0x10
[  123.450427]  __rpm_callback+0x41/0x170
[  123.450435]  ? __pfx_pci_pm_runtime_suspend+0x10/0x10
[  123.450444]  rpm_callback+0x55/0x60
[  123.450451]  ? __pfx_pci_pm_runtime_suspend+0x10/0x10
[  123.450459]  rpm_suspend+0xe6/0x5f0
[  123.450465]  ? __wake_up+0x44/0x60
[  123.450474]  pm_runtime_work+0x84/0xb0
[  123.450482]  process_one_work+0x174/0x330
[  123.450493]  worker_thread+0x251/0x390
[  123.450503]  ? __pfx_worker_thread+0x10/0x10
[  123.450511]  kthread+0xcf/0x100
[  123.450517]  ? __pfx_kthread+0x10/0x10
[  123.450523]  ret_from_fork+0x31/0x50
[  123.450531]  ? __pfx_kthread+0x10/0x10
[  123.450536]  ret_from_fork_asm+0x1a/0x30
[  123.450546]  </TASK>
[  123.450549] Modules linked in: bridge stp llc bonding tls nft_chain_nat nf_nat nft_limit xt_multiport xt_tcpudp xt_LOG nf_log_syslog xt_limit xt_recent xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables binfmt_misc xe drm_gpuvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component nls_ascii nls_cp437 vfat fat snd_sof_pci_intel_mtl snd_sof_intel_hda_generic amdgpu soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda_common intel_uncore_frequency intel_uncore_frequency_common intel_pmc_core snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_sof_pci snd_sof_xtensa_dsp coretemp snd_sof mt7925e kvm_intel mt7925_common snd_sof_utils snd_hda_ext_core mt792x_lib snd_soc_acpi_intel_match mt76_connac_lib snd_soc_acpi kvm snd_soc_core mt76 i915 snd_compress mac80211 snd_pcm_dmaengine crct10dif_pclmul btusb ghash_clmulni_intel soundwire_bus btrtl sha512_ssse3 btintel sha256_ssse3 snd_hda_intel
[  123.450675]  libarc4 sha1_ssse3 btbcm aesni_intel snd_intel_dspcfg amdxcp btmtk drm_exec snd_intel_sdw_acpi gf128mul cfg80211 snd_hda_codec crypto_simd gpu_sched cryptd bluetooth drm_buddy drm_suballoc_helper snd_hda_core drm_display_helper processor_thermal_device_pci processor_thermal_device intel_vpu snd_hwdep processor_thermal_wt_hint rapl snd_pcm intel_cstate cec processor_thermal_rfim eeepc_wmi intel_rapl_msr processor_thermal_rapl rc_core asus_wmi drm_ttm_helper intel_rapl_common snd_timer mei_gsc_proxy drm_shmem_helper ttm pmt_telemetry snd platform_profile mei_me processor_thermal_wt_req intel_uncore pmt_class i2c_algo_bit drm_kms_helper battery processor_thermal_power_floor mei spd5118 soundcore processor_thermal_mbox rfkill evdev joydev int340x_thermal_zone wmi_bmof intel_hid intel_vsec int3400_thermal sg acpi_thermal_rel acpi_tad sparse_keymap button acpi_pad vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio vhost_net tun drm vhost vhost_iotlb tap configfs efi_pstore nfnetlink ip_tables x_tables
[  123.450795]  autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 md_mod dm_snapshot dm_bufio dm_mod hid_generic usbhid hid sd_mod ahci libahci libata xhci_pci xhci_hcd iTCO_wdt intel_pmc_bxt scsi_mod iTCO_vendor_support watchdog usbcore igc i2c_i801 intel_lpss_pci crc32_pclmul i2c_smbus scsi_common crc32c_intel intel_lpss video usb_common idma64 wmi pinctrl_meteorpoint fan pinctrl_meteorlake efivarfs
[  123.450878] CR2: 0000000000000520
[  123.450883] ---[ end trace 0000000000000000 ]---
[  123.510058] kauditd_printk_skb: 2 callbacks suppressed
[  123.550933] RIP: 0010:down_write+0x20/0x60
[  123.550935] Code: 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 fb e8 1e bd ff ff 65 ff 05 7f 37 13 6a 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 26 65 48 8b 05 61 37 13 6a 48 89 43 08 65 ff 0d
[  123.550936] RSP: 0018:ffffb6eec0ce3d38 EFLAGS: 00010246
[  123.550937] RAX: 0000000000000000 RBX: 0000000000000520 RCX: 0000000000000017
[  123.550938] RDX: 0000000000000001 RSI: ffff8f660241f0c8 RDI: 0000000000000520
[  123.550939] RBP: 0000000000000520 R08: 0000000000000000 R09: ffff8f6601812ac0
[  123.550939] R10: 0000000000000003 R11: 0000000000000000 R12: ffffffffc0eb5380
[  123.550940] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8f6610de4d40
[  123.550940] FS:  0000000000000000(0000) GS:ffff8f753fc80000(0000) knlGS:0000000000000000
[  123.550941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  123.550942] CR2: 0000000000000520 CR3: 00000001052c2006 CR4: 0000000000f72ef0
[  123.550943] PKRU: 55555554
[  141.071627] kauditd_printk_skb: 5 callbacks suppressed
[  147.082008] kauditd_printk_skb: 59 callbacks suppressed


Reply to: