Bug#1053864: libdrm-amdgpu1: gpu crash on graphics start with Radeon 760M (both sway and gdm3)
Package: libdrm-amdgpu1
Version: 2.4.115-1
Severity: normal
X-Debbugs-Cc: icefox@dreamquest.io
Dear Maintainer,
When GDM3 starts, or when I turn it off and log into the console by hand
and then start sway or another WM, often the graphics mode switch will
hang for a few seconds on an unresponsive black screen, then go back to
a text console for an instant and try again. This seems to repeat 0-3
times until eventually it works successfully. Sometimes it works on the
first try, often on the second try, etc.
Once Sway or GDM3 and Xorg have actually started, it *seems* perfectly
stable, as far as I've seen so far.
This is a brand new GPU chipset afaik so graphics bugs are pretty
understandable.
CPU: AMD Ryzen 5 7640U w/ Radeon 760M Graphics
Extended renderer info from `glxinfo`:
Device: AMD Radeon Graphics (gfx1103_r1, LLVM 16.0.6, DRM 3.54, 6.5.0-1-amd64) (0x15bf)
Version: 23.2.1
I also see the following errors in dmesg associated with the
apparent-crash-and-restart:
[ 26.625039] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=23, emitted seq=25
[ 26.625482] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
[ 26.625820] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[ 26.810595] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[ 26.810761] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 26.944169] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[ 26.944310] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 27.077693] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[ 27.077834] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 27.211163] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[ 27.211303] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 27.344634] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[ 27.344776] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 27.478028] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[ 27.478175] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 27.611499] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[ 27.611640] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 27.744960] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[ 27.745097] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 27.878425] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[ 27.878564] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 27.880086] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[ 27.909811] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 27.910426] [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
[ 27.910540] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[ 27.911480] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[ 27.913327] [drm] DMUB hardware initialized: version=0x08000E00
[ 27.918776] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:264
[ 27.921376] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:272
[ 27.923969] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:280
[ 27.926566] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:288
[ 27.934650] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:264
[ 27.937248] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:272
[ 27.939841] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:280
[ 27.942439] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:288
[ 28.328853] [drm] kiq ring mec 3 pipe 1 q 0
[ 28.331133] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[ 28.331252] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[ 28.331965] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 28.331968] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 28.331971] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 28.331973] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[ 28.331975] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[ 28.331977] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[ 28.331979] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[ 28.331981] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[ 28.331983] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[ 28.331985] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 28.331987] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[ 28.331990] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[ 28.331992] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[ 28.334786] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow start
[ 28.334791] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow done
[ 28.334933] [drm] Skip scheduling IBs!
[ 28.334955] [drm] Skip scheduling IBs!
[ 28.334964] [drm] Skip scheduling IBs!
[ 28.334971] [drm] Skip scheduling IBs!
[ 28.334979] [drm] Skip scheduling IBs!
[ 28.334987] [drm] Skip scheduling IBs!
[ 28.334995] [drm] Skip scheduling IBs!
[ 28.335006] [drm] Skip scheduling IBs!
[ 28.335014] [drm] Skip scheduling IBs!
[ 28.335070] [drm] Skip scheduling IBs!
[ 28.335079] [drm] Skip scheduling IBs!
[ 28.335085] [drm] Skip scheduling IBs!
[ 28.336265] [drm] ring gfx_32776.1.1 was added
[ 28.337256] [drm] ring compute_32776.2.2 was added
[ 28.338182] [drm] ring sdma_32776.3.3 was added
[ 28.338234] [drm] ring gfx_32776.1.1 ib test pass
[ 28.338272] [drm] ring compute_32776.2.2 ib test pass
[ 28.338470] [drm] ring sdma_32776.3.3 ib test pass
[ 28.339726] amdgpu 0000:c1:00.0: amdgpu: GPU reset(1) succeeded!
[ 28.518882] [drm] Skip scheduling IBs!
[ 28.518892] [drm] Skip scheduling IBs!
[ 28.518897] [drm] Skip scheduling IBs!
[ 28.520085] [drm] Skip scheduling IBs!
[ 28.521361] [drm] Skip scheduling IBs!
[ 28.541083] [drm] Skip scheduling IBs!
[ 28.541114] [drm] Skip scheduling IBs!
[ 28.541143] [drm] Skip scheduling IBs!
[ 28.541159] [drm] Skip scheduling IBs!
[ 28.541173] [drm] Skip scheduling IBs!
[ 28.541193] [drm] Skip scheduling IBs!
[ 28.541215] [drm] Skip scheduling IBs!
[ 28.541219] [drm] Skip scheduling IBs!
[ 28.541239] [drm] Skip scheduling IBs!
I also get the following errors in dmesg from time to time, but they have no visible impact so far:
[ 1046.269344] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1056.509203] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1066.749132] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1076.988590] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1087.228896] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1094.983205] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[ 1097.468792] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1107.708726] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1117.948141] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1128.188485] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1138.428402] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1148.668306] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1158.908169] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1169.147619] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1179.387933] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
Thank you, hopefully this info is useful to someone!
Simon Heath
-- System Information:
Debian Release: trixie/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 6.5.0-1-amd64 (SMP w/12 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages libdrm-amdgpu1 depends on:
ii libc6 2.37-12
ii libdrm2 2.4.115-1
libdrm-amdgpu1 recommends no packages.
libdrm-amdgpu1 suggests no packages.
-- no debconf information
Reply to:
- Prev by Date:
glslang_13.0.0-1_source.changes ACCEPTED into unstable
- Next by Date:
Processed: tagging 1015428, tagging 1015711, tagging 1015708, tagging 1015709, tagging 1015703, tagging 1015704 ...
- Previous by thread:
glslang_13.0.0-1_source.changes ACCEPTED into unstable
- Next by thread:
Processed: tagging 1015428, tagging 1015711, tagging 1015708, tagging 1015709, tagging 1015703, tagging 1015704 ...
- Index(es):