[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

crash probablement lié à amdgpu



Bonjour,

J'ai jeté un oeil (rapide, trop?) aux bugs ouverts sans trouver, donc je rapporte ma petite misère.
Depuis mon dernier apt full-upgrade hier, je constate un crash assez rapide de ma session gnome wayland.
mon dmesg indique alors:
[ 4765.695352] ------------[ cut here ]------------
[ 4765.695354] WARNING: CPU: 2 PID: 721753 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:615 amdgpu_irq_put+0x46/0x70 [amdgpu]
[ 4765.695512] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq rpcsec_gss_krb5 auth_rpcgss nf_tables nfnetlink nfsv4 dns_resolver nfs lockd grace fscache netfs qrtr cmac algif_hash algif_skcipher af_alg bnep sunrpc binfmt_misc nls_ascii nls_cp437 intel_rapl_msr vfat intel_rapl_common fat edac_mce_amd mt7921e btusb mt7921_common btrtl btbcm kvm_amd mt76_connac_lib btintel btmtk mt76 bluetooth kvm mac80211 sha3_generic jitterentropy_rng irqbypass uvcvideo drbg videobuf2_vmalloc libarc4 ghash_clmulni_intel uvc videobuf2_memops snd_hda_codec_hdmi ansi_cprng videobuf2_v4l2 sha512_ssse3 snd_hda_intel ecdh_generic sha512_generic snd_usb_audio videodev snd_intel_dspcfg ecc cfg80211 snd_intel_sdw_acpi snd_usbmidi_lib snd_hda_codec snd_rawmidi videobuf2_common snd_seq_device aesni_intel snd_hda_core crypto_simd mc cryptd snd_pci_acp6x snd_hwdep snd_pci_acp5x snd_pcm rfkill snd_rn_pci_acp3x rapl wmi_bmof snd_timer snd_acp_config snd_soc_acpi snd pcspkr sp5100_tco k10temp ccp watchdog snd_pci_acp3x soundcore joydev sg
[ 4765.695578]  evdev msr parport_pc ppdev lp parport fuse loop efi_pstore configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod hid_cmedia amdgpu hid_generic amdxcp drm_buddy gpu_sched i2c_algo_bit drm_suballoc_helper usbhid uas drm_display_helper hid usb_storage sd_mod cec rc_core dm_mod drm_ttm_helper ttm ahci drm_kms_helper libahci nvme xhci_pci xhci_hcd nvme_core libata drm t10_pi usbcore scsi_mod igc crc32_pclmul crc64_rocksoft crc32c_intel crc64 crc_t10dif crct10dif_generic crct10dif_pclmul i2c_piix4 crct10dif_common usb_common scsi_common video wmi gpio_amdpt gpio_generic button
[ 4765.695636] CPU: 2 PID: 721753 Comm: kworker/u64:2 Tainted: G        W          6.5.0-1-amd64 #1  Debian 6.5.3-1
[ 4765.695639] Hardware name: BESSTAR TECH LIMITED B550/B550, BIOS 5.17 03/31/2022
[ 4765.695640] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[ 4765.695646] RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
[ 4765.695796] Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 cf 5d 1d c4 e9 5a fd ff ff <0f> 0b b8 ea ff ff ff e9 be 5d 1d c4 b8 ea ff ff ff e9 b4 5d 1d c4
[ 4765.695798] RSP: 0018:ffffbc5f85a17c80 EFLAGS: 00010246
[ 4765.695800] RAX: ffff9642e26b1370 RBX: ffff96420e880000 RCX: 0000000000000000
[ 4765.695801] RDX: 0000000000000000 RSI: ffff96420e8a78a8 RDI: ffff96420e880000
[ 4765.695802] RBP: ffff96420e880000 R08: ffffeb8d0e5d0000 R09: ffffeb8d0e5cc001
[ 4765.695803] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000001050
[ 4765.695804] R13: ffff96420e8c1218 R14: ffff964358662000 R15: 0000000000000000
[ 4765.695806] FS:  0000000000000000(0000) GS:ffff9650de280000(0000) knlGS:0000000000000000
[ 4765.695807] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4765.695808] CR2: 00007f6a18805760 CR3: 000000010c2ae000 CR4: 0000000000750ee0
[ 4765.695810] PKRU: 55555554
[ 4765.695810] Call Trace:
[ 4765.695813]  <TASK>
[ 4765.695815]  ? amdgpu_irq_put+0x46/0x70 [amdgpu]
[ 4765.695963]  ? __warn+0x81/0x130
[ 4765.695970]  ? amdgpu_irq_put+0x46/0x70 [amdgpu]
[ 4765.696108]  ? report_bug+0x191/0x1c0
[ 4765.696112]  ? handle_bug+0x3c/0x80
[ 4765.696116]  ? exc_invalid_op+0x17/0x70
[ 4765.696118]  ? asm_exc_invalid_op+0x1a/0x20
[ 4765.696123]  ? amdgpu_irq_put+0x46/0x70 [amdgpu]
[ 4765.696250]  gfx_v9_0_hw_fini+0x35/0x710 [amdgpu]
[ 4765.696380]  amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu]
[ 4765.696497]  ? amdgpu_device_ip_suspend_phase1+0x6f/0xe0 [amdgpu]
[ 4765.696614]  amdgpu_device_ip_suspend+0x36/0x70 [amdgpu]
[ 4765.696731]  amdgpu_device_pre_asic_reset+0xd3/0x2a0 [amdgpu]
[ 4765.696849]  amdgpu_device_gpu_recover+0x4c6/0xd70 [amdgpu]
[ 4765.696968]  amdgpu_job_timedout+0x186/0x270 [amdgpu]
[ 4765.697112]  ? srso_alias_return_thunk+0x5/0x7f
[ 4765.697118]  drm_sched_job_timedout+0x7a/0x110 [gpu_sched]
[ 4765.697124]  process_one_work+0x1e1/0x3f0
[ 4765.697128]  worker_thread+0x51/0x390
[ 4765.697130]  ? _raw_spin_lock_irqsave+0x27/0x60
[ 4765.697133]  ? __pfx_worker_thread+0x10/0x10
[ 4765.697134]  kthread+0xf7/0x130
[ 4765.697137]  ? __pfx_kthread+0x10/0x10
[ 4765.697140]  ret_from_fork+0x34/0x50
[ 4765.697143]  ? __pfx_kthread+0x10/0x10
[ 4765.697146]  ret_from_fork_asm+0x1b/0x30
[ 4765.697152]  </TASK>
[ 4765.697153] ---[ end trace 0000000000000000 ]---
[ 4765.697161] ------------[ cut here ]------------

L'écran clignote du noir <-> retour bureau <-> noir <-> jardinnage en ram <-> noir
Ma machine semble figée mais un CTRL-ALT-FX puis taper à l'aveugle pour me connecter et récupérer le log marche est possible.

Si je laisse la session démarrer quelques minutes puis que je lance (par exemple) un browser assez lourd  (chrome dans mon cas), cela se passe mieux mais si je reprends après une veille j'y ai droit aussi

Je constate plus de stabilité en mode Xorg mais le même message
mais mon dmesg indique :
[ 3881.415898] [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
[ 3881.432742] amdgpu 0000:07:00.0: amdgpu: 0000000083f7ea8e pin failed
[ 3881.432747] [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
[ 3881.449576] amdgpu 0000:07:00.0: amdgpu: 000000002757bd96 pin failed
[ 3881.449582] [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
[ 3881.465611] amdgpu 0000:07:00.0: amdgpu: 0000000083f7ea8e pin failed
[ 3881.465615] [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
(en permanence)

Quelques infos environnementales :
oktail@b550:~$ uname -a
Linux b550 6.5.0-1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.3-1 (2023-09-13) x86_64 GNU/Linux
oktail@b550:~$ cat /etc/debian_version
trixie/sid
oktail@b550:~$ lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus
00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7
01:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset USB 3.1 XHCI Controller
01:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset SATA Controller
01:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset Switch Upstream Port
02:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
02:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
02:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
03:00.0 Ethernet controller: Intel Corporation Ethernet Controller I225-V (rev 01)
04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz
05:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. OM3PDP3 NVMe SSD (rev 01)
06:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. Device 5017 (rev 03)
07:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c8)
07:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller
07:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
07:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
07:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
07:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor (rev 01)
07:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h HD Audio Controller
08:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 81)
08:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 81)

cpuinfo
processor : 15
vendor_id : AuthenticAMD
cpu family : 25
model : 80
model name : AMD Ryzen 7 5700G with Radeon Graphics
stepping : 0
microcode : 0xa50000c
cpu MHz : 400.000
cache size : 512 KB
physical id : 0
siblings : 16
core id : 7
cpu cores : 8
apicid : 15
initial apicid : 15
fpu : yes
fpu_exception : yes
cpuid level : 16
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso
bogomips : 7586.20
TLB size : 2560 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

modèle de mon PC : Minisforum B550

En espérant apporter assez d'eau au moulin !

Merci

--

Reply to: