Bug#1005005: linux-image-5.15.0-3-amd64: suspend failure with admgpu
Control: tags -1 + moreinfo
HI Dominique,
On Sat, Feb 05, 2022 at 11:33:33AM +0100, Dominique Dumont wrote:
> Package: src:linux
> Version: 5.15.15-2
> Severity: normal
> Tags: upstream
>
> Dear Maintainer,
>
>
> Since upgrade to linux-image-5.15.0-3-amd6, suspending my machine no
> longer works correctly: the screen goes blank as usual, but comes back
> after 10s or so.
>
> The most relevant kernel logs are:
>
> [ 257.531771] PM: suspend entry (s2idle)
> [ 257.610570] Filesystems sync: 0.078 seconds
> [ 257.610723] (NULL device *): firmware: direct-loading firmware regulatory.db
> [ 257.610726] (NULL device *): firmware: direct-loading firmware regulatory.db.p7s
> [ 257.610745] (NULL device *): firmware: direct-loading firmware intel/ibt-17-16-1.ddc
> [ 257.610954] (NULL device *): firmware: direct-loading firmware intel/ibt-17-16-1.sfi
> [ 257.610986] (NULL device *): firmware: direct-loading firmware iwlwifi-9000-pu-b0-jf-b0-46.ucode
> [ 257.611211] (NULL device *): firmware: direct-loading firmware i915/kbl_dmc_ver1_04.bin
> [ 257.726247] Freezing user space processes ... (elapsed 0.002 seconds) done.
> [ 257.728699] OOM killer disabled.
> [ 257.728700] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [ 257.730085] printk: Suspending console(s) (use no_console_suspend to debug)
> [ 257.839817] amdgpu:
> last message was failed ret is 65535
> [ 257.839842] amdgpu:
> failed to send message 261 ret is 65535
>
> [ ... lots of failed message ...]
>
> [ 257.840748] ------------[ cut here ]------------
> [ 257.840751] WARNING: CPU: 4 PID: 58 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:2014 dm_suspend+0x19e/0x1c0 [amdgpu]
> [ 257.841665] Modules linked in: rfcomm xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc xfrm_user xfrm_algo nvme_fabrics typec_displayport cmac algif_hash algif_skcipher af_alg overlay bnep binfmt_misc nls_ascii nls_cp437 squashfs vfat fat loop x86_pkg_temp_thermal intel_powerclamp mei_hdcp snd_sof_pci_intel_cnl coretemp dell_rbtn intel_rapl_msr snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci kvm_intel snd_sof_xtensa_dsp snd_hda_codec_hdmi snd_sof soundwire_bus btusb btrtl kvm snd_ctl_led snd_soc_skl btbcm btintel dell_laptop irqbypass iwlmvm snd_soc_hdac_hda rapl bluetooth snd_hda_ext_core snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_codec_realtek snd_soc_acpi_intel_match snd_soc_acpi dell_smm_hwmon snd_hda_codec_generic intel_cstate ledtrig_audio mac80211 snd_soc_core
> [ 257.841816] dell_wmi intel_uncore snd_compress dell_smbios jitterentropy_rng dcdbas snd_hda_intel sha512_ssse3 serio_raw pcspkr libarc4 snd_intel_dspcfg sha512_generic efi_pstore dell_wmi_descriptor uvcvideo snd_intel_sdw_acpi iwlwifi snd_usb_audio snd_hda_codec dell_wmi_sysman videobuf2_vmalloc videobuf2_memops firmware_attributes_class iTCO_wdt videobuf2_v4l2 intel_pmc_bxt snd_hda_core drbg iTCO_vendor_support snd_usbmidi_lib videobuf2_common intel_wmi_thunderbolt wmi_bmof ee1004 watchdog snd_hwdep ansi_cprng joydev snd_rawmidi hid_multitouch videodev cfg80211 snd_seq_device mc snd_pcm processor_thermal_device_pci_legacy processor_thermal_device snd_timer processor_thermal_rfim processor_thermal_mbox ucsi_acpi processor_thermal_rapl snd mei_me typec_ucsi intel_rapl_common ecdh_generic roles soundcore mei ecc rfkill intel_soc_dts_iosf intel_pch_thermal typec int3403_thermal evdev int340x_thermal_zone dell_smo8800 intel_hid int3400_thermal intel_pmc_core acpi_thermal_rel acpi_pad
> [ 257.841935] sparse_keymap ac parport_pc ppdev sunrpc lp parport fuse configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_crypt dm_mod hid_jabra usbhid r8152 mii hid_generic amdgpu i915 rtsx_pci_sdmmc mmc_core crc32_pclmul crc32c_intel ghash_clmulni_intel gpu_sched nvme aesni_intel e1000e crypto_simd cryptd nvme_core i2c_algo_bit drm_ttm_helper ptp t10_pi ttm psmouse pps_core xhci_pci i2c_i801 drm_kms_helper thunderbolt crc_t10dif xhci_hcd cec i2c_smbus rc_core crct10dif_generic crct10dif_pclmul crct10dif_common rtsx_pci drm usbcore i2c_hid_acpi intel_lpss_pci i2c_hid intel_lpss idma64 usb_common hid wmi battery button video
> [ 257.842049] CPU: 4 PID: 58 Comm: kworker/u16:7 Not tainted 5.15.0-3-amd64 #1 Debian 5.15.15-2
> [ 257.842057] Hardware name: Dell Inc. Precision 3540/0M14W7, BIOS 1.9.1 07/06/2020
> [ 257.842062] Workqueue: events_unbound async_run_entry_fn
> [ 257.842075] RIP: 0010:dm_suspend+0x19e/0x1c0 [amdgpu]
> [ 257.842795] Code: ff 31 d2 4c 89 e6 4c 89 ef e8 4e d7 15 00 83 f8 01 74 1e 89 c2 48 c7 c6 40 36 f5 c0 48 c7 c7 50 bc 01 c1 e8 14 89 61 ff eb c2 <0f> 0b e9 95 fe ff ff 4c 89 e6 4c 89 ef e8 60 26 15 00 eb ae e8 d9
> [ 257.842801] RSP: 0018:ffffac778029fcf0 EFLAGS: 00010286
> [ 257.842808] RAX: 0000000000000000 RBX: ffff9e72cb1b5b08 RCX: 0000000000000027
> [ 257.842812] RDX: 0000000000000009 RSI: 0000000000000001 RDI: ffff9e72cb1a0000
> [ 257.842816] RBP: ffff9e72cb1a0000 R08: 0000000000000032 R09: 0000000000000004
> [ 257.842819] R10: 000000000000000f R11: ffffffffb1b82693 R12: ffff9e72cb1a0000
> [ 257.842823] R13: 0000000000000004 R14: 0000000000000002 R15: ffff9e72c0145f05
> [ 257.842826] FS: 0000000000000000(0000) GS:ffff9e762e500000(0000) knlGS:0000000000000000
> [ 257.842831] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 257.842835] CR2: 000055620fb334f6 CR3: 0000000412e10003 CR4: 00000000003706e0
> [ 257.842840] Call Trace:
> [ 257.842846] <TASK>
> [ 257.842851] ? vi_common_set_clockgating_state+0x229/0x2f0 [amdgpu]
> [ 257.843356] amdgpu_device_ip_suspend_phase1+0x5e/0xc0 [amdgpu]
> [ 257.843771] amdgpu_device_suspend+0x62/0xc0 [amdgpu]
> [ 257.844184] amdgpu_pmops_suspend+0x36/0x70 [amdgpu]
> [ 257.844631] pci_pm_suspend+0x71/0x160
> [ 257.844643] ? pci_pm_freeze+0xb0/0xb0
> [ 257.844651] dpm_run_callback+0x47/0x120
> [ 257.844658] __device_suspend+0x10e/0x470
> [ 257.844664] async_suspend+0x1b/0x90
> [ 257.844669] async_run_entry_fn+0x2d/0x130
> [ 257.844677] process_one_work+0x1ee/0x390
> [ 257.844685] worker_thread+0x53/0x3e0
> [ 257.844690] ? process_one_work+0x390/0x390
> [ 257.844696] kthread+0x124/0x150
> [ 257.844706] ? set_kthread_struct+0x40/0x40
> [ 257.844715] ret_from_fork+0x1f/0x30
> [ 257.844728] </TASK>
> [ 257.844730] ---[ end trace f4b6157e346cd3f6 ]---
> [ 258.419015] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <vce_v3_0> failed -110
> [ 258.878568] amdgpu:
> last message was failed ret is 65535
>
> [ ... lots of failed message ...]
>
> [ 259.957788] amdgpu: Failed to force to switch arbf0!
> [ 259.957789] amdgpu: [disable_dpm_tasks] Failed to disable DPM!
> [ 259.957789] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <powerplay> failed -22
> [ 261.029543] amdgpu 0000:3b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
> [ 261.029632] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
> [ 263.171945] amdgpu: cp is busy, skip halt cp
> [ 264.242959] amdgpu: rlc is busy, skip halt rlc
>
> [ ... another kernel warning ... ]
>
> [ 265.315820] amdgpu 0000:3b:00.0: amdgpu: PCI CONFIG reset
> [ 266.386163] PM: pci_pm_suspend(): amdgpu_pmops_suspend+0x0/0x70 [amdgpu] returns -22
> [ 266.386248] PM: dpm_run_callback(): pci_pm_suspend+0x0/0x160 returns -22
> [ 266.386253] amdgpu 0000:3b:00.0: PM: failed to suspend async: error -22
> [ 266.386382] PM: Some devices failed to suspend, or early wake event detected
> [ 266.681752] r8152 4-1.3:1.0 enx00e04c680aef: carrier on
> [ 267.069698] OOM killer enabled.
> [ 267.069700] Restarting tasks ...
>
>
> Not that suspend works fine when booting linux-image-5.15.0-2-amd6.
Does the issue persist if you upgrade to the most recent 5.16.y
version? 5.16.4-1~exp1 (5.16.7-1 should land soon as well). Any chance
you can bisect the commit introducing the issue?
Regards,
Salvatore
Reply to: