Bug#1102175: Repeated lockups
Control: tags -1 + moreinfo
On Thu, Apr 24, 2025 at 08:52:51PM -0700, Alison Chaiken wrote:
> The tracer at https://github.com/chaiken/BPF-sandbox/commit/e0dda39cbe92e0f80805a2a06aa80d6fb3b065b2
> finally was running when apcupsd triggered the lockup. The stack below has
> appeared over and over for days. The time output in the trace has a 1000x
> math error:
>
> 2025-04-24T11:20:23.394893-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: pci_mmcfg_read() for comm apcupsd
> with tid 3183 took 222794 s
> 2025-04-24T11:20:38.809232-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: apcupsd: lock contended for 222810s
> 2025-04-24T11:20:38.809380-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: __down_common+521
> 2025-04-24T11:20:38.809396-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: __down_common+521
> 2025-04-24T11:20:38.809406-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: down_timeout+85
> 2025-04-24T11:20:38.809420-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: acpi_os_wait_semaphore+76
> 2025-04-24T11:20:38.809431-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: acpi_ut_acquire_mutex+58
> 2025-04-24T11:20:38.809444-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]:
> acpi_ex_enter_interpreter+16
> 2025-04-24T11:20:38.809458-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]:
> acpi_ex_system_wait_mutex+82
> 2025-04-24T11:20:38.809470-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]:
> acpi_ds_begin_method_execution+137
> 2025-04-24T11:20:38.809482-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]:
> acpi_ds_call_control_method+79
> 2025-04-24T11:20:38.809493-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: acpi_ps_parse_aml+718
> 2025-04-24T11:20:38.809504-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: acpi_ps_execute_method+319
> 2025-04-24T11:20:38.809515-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: acpi_ns_evaluate+318
> 2025-04-24T11:20:38.809528-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: acpi_evaluate_object+334
> 2025-04-24T11:20:38.809539-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: acpi_device_set_power+495
> 2025-04-24T11:20:38.809551-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]:
> acpi_pci_set_power_state+109
> 2025-04-24T11:20:38.809562-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: pci_power_up+48
> 2025-04-24T11:20:38.809573-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: pci_pm_runtime_resume+51
> 2025-04-24T11:20:38.809584-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: __rpm_callback+65
> 2025-04-24T11:20:38.809596-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: rpm_callback+85
> 2025-04-24T11:20:38.809608-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: rpm_resume+1235
> 2025-04-24T11:20:38.809618-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: rpm_resume+723
> 2025-04-24T11:20:38.809630-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: __pm_runtime_resume+75
> 2025-04-24T11:20:38.809641-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: usb_autoresume_device+30
> 2025-04-24T11:20:38.809651-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: usbdev_open+259
> 2025-04-24T11:20:38.809662-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: chrdev_open+178
> 2025-04-24T11:20:38.809673-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: do_dentry_open+332
> 2025-04-24T11:20:38.809685-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: vfs_open+46
> 2025-04-24T11:20:38.809698-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: path_openat+2094
> 2025-04-24T11:20:38.809708-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: do_filp_open+196
> 2025-04-24T11:20:38.809719-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: do_sys_openat2+174
> 2025-04-24T11:20:38.809730-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: __x64_sys_openat+85
> 2025-04-24T11:20:38.809742-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]: do_syscall_64+130
> 2025-04-24T11:20:38.809753-07:00 schallkreis
> run-investigate-pcie-lockup.sh[2622305]:
> entry_SYSCALL_64_after_hwframe+118
>
> Here is the failure shortly thereafter. It happened while the system was
> idle except for browser, IRC etc. left running. Perhaps he pulseaudio
> failure is related?
>
> 2025-04-24T11:22:43.698237-07:00 schallkreis plasmashell[3874]:
> org.kde.pulseaudio: context kaput
> 2025-04-24T11:22:43.698345-07:00 schallkreis plasmashell[3874]:
> org.kde.pulseaudio: No object for name
> "alsa_output.pci-0000_73_00.6.analog-stereo"
> 2025-04-24T11:22:43.698364-07:00 schallkreis plasmashell[3874]:
> org.kde.pulseaudio: No object for name
> "alsa_output.pci-0000_73_00.6.analog-stereo.monitor"
> 2025-04-24T11:22:44.139668-07:00 schallkreis kded6[3770]:
> org.kde.pulseaudio: context kaput
> 2025-04-24T11:22:44.139719-07:00 schallkreis kded6[3770]:
> org.kde.pulseaudio: No object for name
> "alsa_output.pci-0000_73_00.6.analog-stereo"
> 2025-04-24T11:22:44.139733-07:00 schallkreis kded6[3770]:
> org.kde.pulseaudio: No object for name
> "alsa_output.pci-0000_73_00.6.analog-stereo.monitor"
> 2025-04-24T11:22:44.139817-07:00 schallkreis systemd[3492]:
> pulseaudio.service: Main process exited, code=killed, status=9/KILL
> 2025-04-24T11:22:44.580911-07:00 schallkreis systemd[3492]:
> pulseaudio.service: Failed with result 'signal'.
> 2025-04-24T11:22:44.581246-07:00 schallkreis systemd[3492]:
> pulseaudio.service: Consumed 33min 45.125s CPU time, 11.4M memory peak.
> 2025-04-24T11:22:44.801449-07:00 schallkreis systemd[3492]:
> pulseaudio.service: Scheduled restart job, restart counter is at 1.
> 2025-04-24T11:22:45.022289-07:00 schallkreis systemd[3492]: Starting
> pulseaudio.service - Sound Service...
> 2025-04-24T11:22:47.669907-07:00 schallkreis plasmashell[3874]: pw.conf:
> can't load config client.conf: No such file or directory
> 2025-04-24T11:22:47.669986-07:00 schallkreis plasmashell[3874]: pw.conf:
> can't load config client.conf: No such file or directory
> 2025-04-24T11:22:47.670011-07:00 schallkreis plasmashell[3874]:
> kpipewire_logging: Failed to create PipeWire context
> 2025-04-24T11:22:47.670024-07:00 schallkreis plasmashell[3874]: received
> error while creating the stream "Failed to create PipeWire context" Media
> monitor will not work.
> 2025-04-24T11:22:52.523562-07:00 schallkreis plasmashell[3874]: pw.conf:
> can't load config client.conf: No such file or directory
> 2025-04-24T11:22:52.523678-07:00 schallkreis plasmashell[3874]: pw.conf:
> can't load config client.conf: No such file or directory
> 2025-04-24T11:22:52.523700-07:00 schallkreis plasmashell[3874]:
> kpipewire_logging: Failed to create PipeWire context
> 2025-04-24T11:22:52.523713-07:00 schallkreis plasmashell[3874]: received
> error while creating the stream "Failed to create PipeWire context" Media
> monitor will not work.
> 2025-04-24T11:22:52.744151-07:00 schallkreis kernel: watchdog: BUG: soft
> lockup - CPU#16 stuck for 23s! [apcupsd:3183]
> 2025-04-24T11:22:52.744159-07:00 schallkreis kernel: Modules linked in: qrtr
> nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc sunrpc nls_ascii nls_cp437
> vfat fat amd_atl intel_rapl_msr intel_rapl_common mt7925e mt7925_common
> mt792x_lib mt76_connac_lib edac_mce_amd snd_hda_codec_realtek mt76 kvm_amd
> snd_hda_codec_generic mac80211 snd_hda_scodec_component snd_hda_codec_hdmi
> kvm snd_hda_intel snd_intel_dspcfg libarc4 snd_intel_sdw_acpi snd_hda_codec
> cfg80211 crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 snd_hda_core
> sha256_ssse3 sha1_ssse3 snd_hwdep aesni_intel snd_pcm gf128mul crypto_simd
> cryptd wmi_bmof snd_timer spd5118 rapl ccp snd pcspkr k10temp rfkill
> soundcore joydev evdev parport_pc ppdev lp parport efi_pstore configfs
> nfnetlink efivarfs ip_tables x_tables autofs4 ext4 mbcache jbd2
> crc32c_generic hid_generic usbhid hid amdgpu amdxcp drm_exec gpu_sched
> drm_buddy i2c_algo_bit drm_suballoc_helper drm_display_helper cec rc_core
> drm_ttm_helper ttm xhci_pci xhci_hcd drm_kms_helper ahci libahci libata drm
> thunderbolt r8169 usbcore nvme sp5100_tco realtek
> 2025-04-24T11:22:52.744161-07:00 schallkreis kernel: scsi_mod watchdog
> mdio_devres nvme_core libphy crc32_pclmul crc32c_intel i2c_piix4 i2c_smbus
> usb_common scsi_common nvme_auth video crc16 gpio_amdpt wmi gpio_generic
> button
> 2025-04-24T11:22:52.744162-07:00 schallkreis kernel: CPU: 16 UID: 0 PID:
> 3183 Comm: apcupsd Not tainted 6.12.21-amd64 #1 Debian 6.12.21-1
> 2025-04-24T11:22:52.744162-07:00 schallkreis kernel: Hardware name: System76
> Thelio Mira/Thelio Mira, BIOS 3.11.SP01 12/05/2024
> 2025-04-24T11:22:52.744163-07:00 schallkreis kernel: RIP:
> 0010:pci_mmcfg_read+0xa4/0xe0
> 2025-04-24T11:22:52.744164-07:00 schallkreis kernel: Code: fe 01 75 0b 4c 01
> e0 8a 00 0f b6 c0 89 45 00 e8 a2 6a 51 ff 31 c0 5b 5d 41 5c 41 5d 41 5e 41
> 5f c3 cc cc cc cc 4c 01 e0 8b 00 <89> 45 00 eb e0 4c 01 e0 66 8b 00 0f b7 c0
> 89 45 00 eb d2 e8 74 6a
> 2025-04-24T11:22:52.744164-07:00 schallkreis kernel: RSP:
> 0018:ffff9d0645f9f4b8 EFLAGS: 00000286
> 2025-04-24T11:22:52.744164-07:00 schallkreis kernel: RAX: 00000000ffffffff
> RBX: 0000000000f00000 RCX: 0000000000000ffc
> 2025-04-24T11:22:52.744165-07:00 schallkreis kernel: RDX: 00000000000000ff
> RSI: 000000000000000f RDI: 0000000000000000
> 2025-04-24T11:22:52.744165-07:00 schallkreis kernel: RBP: ffff9d0645f9f4f4
> R08: 0000000000000004 R09: ffff9d0645f9f4f4
> 2025-04-24T11:22:52.744170-07:00 schallkreis kernel: R10: 000000000000000f
> R11: ffff9d0645f9f2e0 R12: 0000000000000ffc
> 2025-04-24T11:22:52.744170-07:00 schallkreis kernel: R13: 0000000000000000
> R14: 0000000000000004 R15: 0000000000000000
> 2025-04-24T11:22:52.744170-07:00 schallkreis kernel: FS:
> 00007f8da22fc0c0(0000) GS:ffff8c8b5fa00000(0000) knlGS:0000000000000000
> 2025-04-24T11:22:52.744171-07:00 schallkreis kernel: CS: 0010 DS: 0000 ES:
> 0000 CR0: 0000000080050033
> 2025-04-24T11:22:52.744171-07:00 schallkreis kernel: CR2: 00007f9fdd253680
> CR3: 0000000111982000 CR4: 0000000000f50ef0
> 2025-04-24T11:22:52.744171-07:00 schallkreis kernel: PKRU: 55555554
> 2025-04-24T11:22:52.744172-07:00 schallkreis kernel: Call Trace:
> 2025-04-24T11:22:52.744172-07:00 schallkreis kernel: <IRQ>
> 2025-04-24T11:22:52.744172-07:00 schallkreis kernel: ?
> watchdog_timer_fn.cold+0x3d/0xa1
> 2025-04-24T11:22:52.744173-07:00 schallkreis kernel: ?
> __pfx_watchdog_timer_fn+0x10/0x10
> 2025-04-24T11:22:52.744174-07:00 schallkreis kernel: ?
> __hrtimer_run_queues+0x12f/0x2a0
> 2025-04-24T11:22:52.744174-07:00 schallkreis kernel: ?
> hrtimer_interrupt+0xfa/0x210
> 2025-04-24T11:22:52.744174-07:00 schallkreis kernel: ?
> __sysvec_apic_timer_interrupt+0x52/0x100
> 2025-04-24T11:22:52.744175-07:00 schallkreis kernel: ?
> sysvec_apic_timer_interrupt+0x6c/0x90
> 2025-04-24T11:22:52.744176-07:00 schallkreis kernel: </IRQ>
> 2025-04-24T11:22:52.744176-07:00 schallkreis kernel: <TASK>
> 2025-04-24T11:22:52.744176-07:00 schallkreis kernel: ?
> asm_sysvec_apic_timer_interrupt+0x1a/0x20
> 2025-04-24T11:22:52.744177-07:00 schallkreis kernel: ?
> pci_mmcfg_read+0xa4/0xe0
> 2025-04-24T11:22:52.744177-07:00 schallkreis kernel: ?
> pci_mmcfg_read+0x4f/0xe0
> 2025-04-24T11:22:52.744177-07:00 schallkreis kernel:
> trace_clock_x86_tsc+0x20/0x20
> 2025-04-24T11:22:52.744178-07:00 schallkreis kernel:
> pci_find_next_ext_capability+0x89/0xf0
> 2025-04-24T11:22:52.744178-07:00 schallkreis kernel:
> pci_restore_ltr_state+0x28/0x50
> 2025-04-24T11:22:52.744179-07:00 schallkreis kernel:
> pci_restore_state.part.0+0x29/0x370
> 2025-04-24T11:22:52.744179-07:00 schallkreis kernel:
> pci_pm_runtime_resume+0x45/0xf0
> 2025-04-24T11:22:52.744179-07:00 schallkreis kernel: ?
> __pfx_pci_pm_runtime_resume+0x10/0x10
> 2025-04-24T11:22:52.744179-07:00 schallkreis kernel:
> __rpm_callback+0x41/0x170
> 2025-04-24T11:22:52.744180-07:00 schallkreis kernel: ?
> __pfx_pci_pm_runtime_resume+0x10/0x10
> 2025-04-24T11:22:52.744180-07:00 schallkreis kernel: rpm_callback+0x55/0x60
> 2025-04-24T11:22:52.744180-07:00 schallkreis kernel: ?
> __pfx_pci_pm_runtime_resume+0x10/0x10
> 2025-04-24T11:22:52.744181-07:00 schallkreis kernel: rpm_resume+0x4d3/0x700
> 2025-04-24T11:22:52.744181-07:00 schallkreis kernel: ?
> sched_clock_cpu+0xf/0x1d0
> 2025-04-24T11:22:52.744181-07:00 schallkreis kernel: ?
> psi_group_change+0x138/0x300
> 2025-04-24T11:22:52.744182-07:00 schallkreis kernel: rpm_resume+0x2d3/0x700
> 2025-04-24T11:22:52.744183-07:00 schallkreis kernel: ?
> update_curr+0x98/0x280
> 2025-04-24T11:22:52.744183-07:00 schallkreis kernel: rpm_resume+0x2d3/0x700
> 2025-04-24T11:22:52.744183-07:00 schallkreis kernel: rpm_resume+0x2d3/0x700
> 2025-04-24T11:22:52.744184-07:00 schallkreis kernel:
> __pm_runtime_resume+0x4b/0x80
> 2025-04-24T11:22:52.744184-07:00 schallkreis kernel:
> usb_autoresume_device+0x1e/0x50 [usbcore]
> 2025-04-24T11:22:52.744184-07:00 schallkreis kernel:
> usbdev_open+0x103/0x260 [usbcore]
> 2025-04-24T11:22:52.744185-07:00 schallkreis kernel: ?
> bpf_prog_e8932b6bae2b9745_restrict_filesystems+0xb7/0x132
> 2025-04-24T11:22:52.744185-07:00 schallkreis kernel: chrdev_open+0xb2/0x230
> 2025-04-24T11:22:52.744186-07:00 schallkreis kernel: ?
> __pfx_chrdev_open+0x10/0x10
> 2025-04-24T11:22:52.744186-07:00 schallkreis kernel:
> do_dentry_open+0x14c/0x440
> 2025-04-24T11:22:52.744186-07:00 schallkreis kernel: vfs_open+0x2e/0xe0
> 2025-04-24T11:22:52.744186-07:00 schallkreis kernel:
> path_openat+0x82e/0x12d0
> 2025-04-24T11:22:52.744187-07:00 schallkreis kernel: ? kick_pool+0x5c/0x140
> 2025-04-24T11:22:52.744187-07:00 schallkreis kernel:
> do_filp_open+0xc4/0x170
> 2025-04-24T11:22:52.744188-07:00 schallkreis kernel:
> do_sys_openat2+0xae/0xe0
> 2025-04-24T11:22:52.744188-07:00 schallkreis kernel:
> __x64_sys_openat+0x55/0xa0
> 2025-04-24T11:22:52.744188-07:00 schallkreis kernel:
> do_syscall_64+0x82/0x190
> 2025-04-24T11:22:52.744188-07:00 schallkreis kernel: ? vfs_read+0x15e/0x370
> 2025-04-24T11:22:52.744189-07:00 schallkreis kernel: ? vfs_read+0x15e/0x370
> 2025-04-24T11:22:52.744190-07:00 schallkreis kernel: ?
> syscall_exit_to_user_mode+0x4d/0x210
> 2025-04-24T11:22:52.744190-07:00 schallkreis kernel: ?
> do_syscall_64+0x8e/0x190
> 2025-04-24T11:22:52.744190-07:00 schallkreis kernel: ?
> terminate_walk+0x61/0x100
> 2025-04-24T11:22:52.744190-07:00 schallkreis kernel: ?
> path_openat+0x495/0x12d0
> 2025-04-24T11:22:52.744191-07:00 schallkreis kernel: ?
> do_syscall_64+0x8e/0x190
> 2025-04-24T11:22:52.744191-07:00 schallkreis kernel: ?
> syscall_exit_to_user_mode+0x4d/0x210
> 2025-04-24T11:22:52.744191-07:00 schallkreis kernel: ?
> do_filp_open+0xc4/0x170
> 2025-04-24T11:22:52.744192-07:00 schallkreis kernel: ?
> __rseq_handle_notify_resume+0xa2/0x4a0
> 2025-04-24T11:22:52.744192-07:00 schallkreis kernel: ?
> syscall_exit_to_user_mode+0x172/0x210
> 2025-04-24T11:22:52.744193-07:00 schallkreis kernel: ?
> do_syscall_64+0x8e/0x190
> 2025-04-24T11:22:52.744194-07:00 schallkreis kernel: ?
> do_syscall_64+0x8e/0x190
> 2025-04-24T11:22:52.744194-07:00 schallkreis kernel:
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 2025-04-24T11:22:52.744194-07:00 schallkreis kernel: RIP:
> 0033:0x7f8da23979ee
> 2025-04-24T11:22:52.744195-07:00 schallkreis kernel: Code: 08 0f 85 f5 4b ff
> ff 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c
> 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 80
> 00 00 00 00 48 83 ec 08
> 2025-04-24T11:22:52.744195-07:00 schallkreis kernel: RSP:
> 002b:00007fff105c8718 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
> 2025-04-24T11:22:52.744196-07:00 schallkreis kernel: RAX: ffffffffffffffda
> RBX: 00007f8da22fc0c0 RCX: 00007f8da23979ee
> 2025-04-24T11:22:52.964717-07:00 schallkreis kernel: RDX: 0000000000080002
> RSI: 00005608b22ab500 RDI: ffffffffffffff9c
> 2025-04-24T11:22:52.964718-07:00 schallkreis kernel: RBP: 0000000000000007
> R08: 0000000000000000 R09: 0000000000000000
> 2025-04-24T11:22:52.964718-07:00 schallkreis kernel: R10: 0000000000000000
> R11: 0000000000000246 R12: 00007fff105c8850
> 2025-04-24T11:22:52.964719-07:00 schallkreis kernel: R13: 00007fff105c8800
> R14: 0000560886d392cb R15: 0000000000000004
> 2025-04-24T11:22:52.964719-07:00 schallkreis kernel: </TASK>
>
> Perhaps this
>
> 2025-04-24T11:22:52.744179-07:00 schallkreis kernel:
> pci_pm_runtime_resume+0x45/0xf0
> 2025-04-24T11:22:52.744179-07:00 schallkreis kernel: ?
> __pfx_pci_pm_runtime_resume+0x10/0x10
> 2025-04-24T11:22:52.744179-07:00 schallkreis kernel:
> __rpm_callback+0x41/0x170
> 2025-04-24T11:22:52.744180-07:00 schallkreis kernel: ?
> __pfx_pci_pm_runtime_resume+0x10/0x10
> 2025-04-24T11:22:52.744180-07:00 schallkreis kernel: rpm_callback+0x55/0x60
> 2025-04-24T11:22:52.744180-07:00 schallkreis kernel: ?
> __pfx_pci_pm_runtime_resume+0x10/0x10
>
> is the output of the actual bug? That's where the stack which printed out
> over and over and the failure stack originated.
is the issue you were experiencing still triggering with the most
recent kernel in unstable, 6.12.32-1?
Regards,
Salvatore
Reply to: