[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1102175: Repeated lockups in AMDGPU DRM driver



Hello! I'm from a completely different ecosystem on Gentoo and I'm experiencing the same thing on _two_ different machines. Thought you might appreciate my logs too. It's been happening on one of my machines maybe once every few days for a few months, and just started happening 1-2 times per day on my other machine. Happens while the computer is in use (web browsing + coding + youtube video), as well as when it's idle (afk overnight sleeping). Here's a more detailed log https://gist.github.com/gamozolabs/97e0dc50009022d3fe0c0895cc4f6e60 .

Kernel/CPU: Linux tibia 6.12.21-gentoo-dist #1 SMP PREEMPT_DYNAMIC Sat Mar 29 13:12:36 -00 2025 x86_64 AMD Ryzen 7 9800X3D 8-Core Processor AuthenticAMD GNU/Linux

GPU: Radeon Pro W7800

pleb@tibia ~> sudo lspci -t -vv
-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Root Complex
           +-00.2  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge IOMMU
           +-01.0  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
           +-01.1-[01-03]----00.0-[02-03]----00.0-[03]--+-00.0  Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon Pro W7800]
           |                                            \-00.1  Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio
           +-01.2-[04]----00.0  Phison Electronics Corporation E18 PCIe4 NVMe Controller
           +-01.3-[05]--+-00.0  Intel Corporation Ethernet Controller E810-C for QSFP
           |            \-00.1  Intel Corporation Ethernet Controller E810-C for QSFP
           +-02.0  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
           +-02.1-[06-18]----00.0-[07-18]--+-00.0-[08]--
           |                               +-04.0-[09]--
           |                               +-05.0-[0a]--
           |                               +-06.0-[0b]----00.0  Realtek Semiconductor Co., Ltd. Device 8126
           |                               +-07.0-[0c]----00.0  MEDIATEK Corp. Device 0717
           |                               +-08.0-[0d-16]----00.0-[0e-16]--+-00.0-[0f]--
           |                               |                               +-04.0-[10]--
           |                               |                               +-05.0-[11]--
           |                               |                               +-06.0-[12]--
           |                               |                               +-07.0-[13]--
           |                               |                               +-08.0-[14]--
           |                               |                               +-0c.0-[15]----00.0  Advanced Micro Devices, Inc. [AMD] Device 43fd
           |                               |                               \-0d.0-[16]----00.0  Advanced Micro Devices, Inc. [AMD] 600 Series Chipset SATA Controller
           |                               +-0c.0-[17]----00.0  Advanced Micro Devices, Inc. [AMD] Device 43fd
           |                               \-0d.0-[18]----00.0  Advanced Micro Devices, Inc. [AMD] 600 Series Chipset SATA Controller
           +-02.2-[19-7c]----00.0-[1a-7c]--+-00.0-[1b-4a]--
           |                               +-01.0-[4b-7a]--
           |                               +-02.0-[7b]----00.0  ASMedia Technology Inc. Device 2426
           |                               \-03.0-[7c]----00.0  ASMedia Technology Inc. Device 2425
           +-03.0  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
           +-04.0  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
           +-08.0  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
           +-08.1-[7d]--+-00.0  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge PCIe Dummy Function
           |            +-00.2  Advanced Micro Devices, Inc. [AMD] Family 19h PSP/CCP
           |            +-00.3  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 3.1 xHCI
           |            +-00.4  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 3.1 xHCI
           |            \-00.6  Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller
           +-08.3-[7e]----00.0  Advanced Micro Devices, Inc. [AMD] Device 15b8
           +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
           +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
           +-18.0  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 0
           +-18.1  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 1
           +-18.2  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 2
           +-18.3  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 3
           +-18.4  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 4
           +-18.5  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 5
           +-18.6  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 6
           \-18.7  Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 7

Apr 29 20:28:38 tibia kernel: pcieport 0000:00:02.2: PME: Spurious native interrupt!
Apr 29 22:33:19 tibia systemd[1081]: Started modprobed-db scan and store new modules.
Apr 29 22:33:19 tibia modprobed-db[1071656]: No new modules detected
Apr 30 03:08:17 tibia kernel: pcieport 0000:19:00.0: Unable to change power state from D3cold to D0, device inaccessible
Apr 30 03:08:21 tibia kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 30 03:08:22 tibia kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 30 03:08:23 tibia kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Apr 30 03:08:24 tibia kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Apr 30 03:08:25 tibia kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Apr 30 03:08:27 tibia kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 30 03:08:28 tibia kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 30 03:08:28 tibia kernel: [drm] Fence fallback timer expired on ring sdma1
Apr 30 03:08:29 tibia kernel: [drm] Fence fallback timer expired on ring sdma1
Apr 30 03:08:29 tibia kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Apr 30 03:08:29 tibia kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 30 03:08:29 tibia kernel: [drm] Fence fallback timer expired on ring sdma1
Apr 30 03:08:30 tibia kernel: [drm] Fence fallback timer expired on ring sdma1
Apr 30 03:08:31 tibia kernel: [drm] Fence fallback timer expired on ring sdma1
Apr 30 03:08:31 tibia kernel: [drm] Fence fallback timer expired on ring sdma1
Apr 30 03:08:31 tibia kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Apr 30 03:08:32 tibia kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Apr 30 03:08:33 tibia kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Apr 30 03:08:34 tibia kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Apr 30 03:08:34 tibia kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Apr 30 03:08:37 tibia kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Apr 30 03:08:37 tibia kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 30 03:08:37 tibia kernel: [drm] Fence fallback timer expired on ring sdma1
Apr 30 03:08:37 tibia kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 30 03:08:37 tibia kernel: [drm] Fence fallback timer expired on ring comp_1.0.1
Apr 30 03:08:38 tibia kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Apr 30 03:08:38 tibia kernel: [drm] Fence fallback timer expired on ring sdma1
Apr 30 03:08:38 tibia kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 30 03:08:40 tibia kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Apr 30 03:08:40 tibia kernel: watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [device poll:533454]
Apr 30 03:08:40 tibia kernel: CPU#9 Utilization every 4s during lockup:
Apr 30 03:08:42 tibia kernel: #1: 2% system, 0% softirq, 0% hardirq, 0% idle
Apr 30 03:08:43 tibia kernel: #2: 2% system, 0% softirq, 0% hardirq, 0% idle
Apr 30 03:08:43 tibia kernel: #3: 2% system, 0% softirq, 0% hardirq, 0% idle
Apr 30 03:08:44 tibia kernel: #4: 2% system, 0% softirq, 0% hardirq, 0% idle
Apr 30 03:08:44 tibia kernel: #5: 2% system, 0% softirq, 0% hardirq, 0% idle
Apr 30 03:08:44 tibia kernel: Modules linked in: veth nf_conntrack_netlink xt_nat iptable_raw xt_set ip_set rfcomm snd_seq_dummy snd_seq_midi snd_hrtimer snd_seq_midi_event snd_seq xt_CHECKSUM xt_MASQUERADE ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat bridge stp llc overlay bnep vfat fat irdma i40e ib_uverbs ib_core amd_atl intel_rapl_msr intel_rapl_common mt7925e mt7925_common mt792x_lib mt76_connac_lib amdgpu edac_mce_amd snd_hda_codec_hdmi mt76 kvm_amd snd_hda_intel spd5118 snd_intel_dspcfg snd_usb_audio mac80211 snd_intel_sdw_acpi kvm amdxcp snd_usbmidi_lib snd_hda_codec btusb gpu_sched snd_ump i2c_algo_bit btrtl drm_suballoc_helper snd_rawmidi drm_ttm_helper snd_hda_core snd_seq_device btintel ttm mc snd_hwdep btbcm btmtk drm_exec libarc4 i2c_piix4 ice drm_display_helper snd_pcm thunderbolt rapl pcspkr wmi_bmof k10temp i2c_smbus cec bluetooth snd_timer drm_buddy r8169 cfg80211 snd gnss soundcore libie realtek rfkill joydev gpio_amdpt gpio_generic acpi_pad ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT
Apr 30 03:08:44 tibia kernel: nf_reject_ipv4 xt_limit xt_addrtype xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter ip_tables fuse loop nfnetlink crct10dif_pclmul crc32_pclmul crc32c_intel nvme polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 sp5100_tco nvme_core nvme_auth video wmi lm92
Apr 30 03:08:44 tibia kernel: CPU: 9 UID: 1000 PID: 533454 Comm: device poll Not tainted 6.12.21-gentoo-dist #1
Apr 30 03:08:44 tibia kernel: Hardware name: ASRock X870E Taichi Lite/X870E Taichi Lite, BIOS 3.20 02/21/2025
Apr 30 03:08:44 tibia kernel: RIP: 0010:pci_mmcfg_read+0xa4/0xe0
Apr 30 03:08:44 tibia kernel: Code: fe 01 75 0b 4c 01 e0 8a 00 0f b6 c0 89 45 00 e8 f2 ee f9 fe 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 4c 01 e0 8b 00 <89> 45 00 eb e0 4c 01 e0 66 8b 00 0f b7 c0 89 45 00 eb d2 e8 c4 ee
Apr 30 03:08:44 tibia kernel: RSP: 0018:ffffb933e32eb620 EFLAGS: 00000286
Apr 30 03:08:44 tibia kernel: RAX: 00000000ffffffff RBX: 0000000001900000 RCX: 0000000000000ffc
Apr 30 03:08:44 tibia kernel: RDX: 00000000000000ff RSI: 0000000000000019 RDI: 0000000000000000
Apr 30 03:08:44 tibia kernel: RBP: ffffb933e32eb65c R08: 0000000000000004 R09: ffffb933e32eb65c
Apr 30 03:08:44 tibia kernel: R10: 0000000000000019 R11: ffffffff9e231d60 R12: 0000000000000ffc
Apr 30 03:08:44 tibia kernel: R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000000
Apr 30 03:08:44 tibia kernel: FS: 00007fcda051e6c0(0000) GS:ffff8b60bdc80000(0000) knlGS:0000000000000000
Apr 30 03:08:44 tibia kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 30 03:08:44 tibia kernel: CR2: 00007faa4f42c000 CR3: 00000003cb1b6000 CR4: 0000000000f50ef0
Apr 30 03:08:44 tibia kernel: PKRU: 55555554
Apr 30 03:08:44 tibia kernel: Call Trace:
Apr 30 03:08:44 tibia kernel: <IRQ>
Apr 30 03:08:44 tibia kernel: ? watchdog_timer_fn.cold+0x233/0x311
Apr 30 03:08:44 tibia kernel: ? __pfx_watchdog_timer_fn+0x10/0x10
Apr 30 03:08:44 tibia kernel: ? __hrtimer_run_queues+0x113/0x280
Apr 30 03:08:44 tibia kernel: ? hrtimer_interrupt+0xfa/0x210
Apr 30 03:08:44 tibia kernel: ? __sysvec_apic_timer_interrupt+0x52/0x100
Apr 30 03:08:44 tibia kernel: ? sysvec_apic_timer_interrupt+0x6c/0x90
Apr 30 03:08:44 tibia kernel: </IRQ>
Apr 30 03:08:44 tibia kernel: <TASK>
Apr 30 03:08:44 tibia kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
Apr 30 03:08:44 tibia kernel: ? __pfx_pci_mmcfg_read+0x10/0x10
Apr 30 03:08:44 tibia kernel: ? pci_mmcfg_read+0xa4/0xe0
Apr 30 03:08:44 tibia kernel: pci_bus_read_config_dword+0x4a/0x80
Apr 30 03:08:44 tibia kernel: pci_find_next_ext_capability+0x89/0xf0
Apr 30 03:08:44 tibia kernel: ? _raw_spin_unlock_irqrestore+0x1d/0x40
Apr 30 03:08:44 tibia kernel: pci_restore_ltr_state+0x28/0x50
Apr 30 03:08:44 tibia kernel: pci_restore_state.part.0+0x29/0x370
Apr 30 03:08:44 tibia kernel: ? pci_bus_read_config_word+0x4a/0x90
Apr 30 03:08:44 tibia kernel: pci_pm_runtime_resume+0x45/0xf0
Apr 30 03:08:44 tibia kernel: ? __pfx_pci_pm_runtime_resume+0x10/0x10
Apr 30 03:08:44 tibia kernel: __rpm_callback+0x41/0x170
Apr 30 03:08:44 tibia kernel: ? __pfx_pci_pm_runtime_resume+0x10/0x10
Apr 30 03:08:44 tibia kernel: rpm_callback+0x55/0x60
Apr 30 03:08:44 tibia kernel: ? __pfx_pci_pm_runtime_resume+0x10/0x10
Apr 30 03:08:44 tibia kernel: rpm_resume+0x4d3/0x700
Apr 30 03:08:44 tibia kernel: ? check_preempt_wakeup_fair+0x1f3/0x280
Apr 30 03:08:44 tibia kernel: rpm_resume+0x2d3/0x700
Apr 30 03:08:44 tibia kernel: rpm_resume+0x2d3/0x700
Apr 30 03:08:44 tibia kernel: ? kick_pool+0x60/0x160
Apr 30 03:08:44 tibia kernel: rpm_resume+0x2d3/0x700
Apr 30 03:08:44 tibia kernel: ? klist_put+0x1f/0xb0
Apr 30 03:08:44 tibia kernel: __pm_runtime_resume+0x4b/0x80
Apr 30 03:08:44 tibia kernel: usb_autoresume_device+0x1e/0x50
Apr 30 03:08:44 tibia kernel: usbdev_open+0x133/0x2b0
Apr 30 03:08:44 tibia kernel: ? __cgroup_bpf_check_dev_permission+0x10c/0x190
Apr 30 03:08:44 tibia kernel: chrdev_open+0xb2/0x230
Apr 30 03:08:44 tibia kernel: ? __pfx_chrdev_open+0x10/0x10
Apr 30 03:08:44 tibia kernel: do_dentry_open+0x14c/0x4a0
Apr 30 03:08:44 tibia kernel: vfs_open+0x2e/0xe0
Apr 30 03:08:44 tibia kernel: path_openat+0x82e/0x12d0
Apr 30 03:08:44 tibia kernel: do_filp_open+0xc4/0x170
Apr 30 03:08:44 tibia kernel: do_sys_openat2+0xae/0xe0
Apr 30 03:08:44 tibia kernel: __x64_sys_openat+0x55/0xa0
Apr 30 03:08:44 tibia kernel: do_syscall_64+0x82/0x190
Apr 30 03:08:44 tibia kernel: ? inode_update_timestamps+0x15c/0x190
Apr 30 03:08:44 tibia kernel: ? generic_update_time+0x4e/0x60
Apr 30 03:08:44 tibia kernel: ? touch_atime+0xb5/0x120
Apr 30 03:08:44 tibia kernel: ? iterate_dir+0x182/0x200
Apr 30 03:08:44 tibia kernel: ? __x64_sys_getdents64+0x108/0x130
Apr 30 03:08:44 tibia kernel: ? __pfx_filldir64+0x10/0x10
Apr 30 03:08:44 tibia kernel: ? syscall_exit_to_user_mode+0x10/0x200
Apr 30 03:08:44 tibia kernel: ? do_syscall_64+0x8e/0x190
Apr 30 03:08:44 tibia kernel: ? syscall_exit_to_user_mode+0x10/0x200
Apr 30 03:08:44 tibia kernel: ? do_syscall_64+0x8e/0x190
Apr 30 03:08:44 tibia kernel: ? __x64_sys_getdents64+0x108/0x130
Apr 30 03:08:44 tibia kernel: ? __pfx_filldir64+0x10/0x10
Apr 30 03:08:44 tibia kernel: ? syscall_exit_to_user_mode+0x10/0x200
Apr 30 03:08:44 tibia kernel: ? do_syscall_64+0x8e/0x190
Apr 30 03:08:44 tibia kernel: ? do_syscall_64+0x8e/0x190
Apr 30 03:08:44 tibia kernel: ? do_syscall_64+0x8e/0x190
Apr 30 03:08:44 tibia kernel: ? syscall_exit_to_user_mode+0x10/0x200
Apr 30 03:08:44 tibia kernel: ? do_syscall_64+0x8e/0x190
Apr 30 03:08:44 tibia kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e

Attachment: publickey - b@bfa.lk - 0x5233E6AE.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: