[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

recurrent soft/hard lockup on CPU



Hello,

I suspect this will go up into the ether, but just in case someone has
experience with this kind of bug, I'll try because even a tip on how to
debug it and narrow down the problem would be helpful.

For a few weeks, I'm experiencing complete system freezes that occur
nearly every day and seemingly at random or not linked to any easily
identifiable activity.  This is a new system (Lenovo ThinkCentre M800),
so I doubt hardware is the culprit.  Checking the output of dmesg, and
looking at /var/log/kern.log or /var/log/syslog shows soft or hard
lockups (excerpt of a hard lockup example):

---<--------------------cut here---------------start------------------->---
May  5 22:48:19 otaria kernel: [ 6706.415657] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
May  5 22:48:19 otaria kernel: [ 6706.415659] Modules linked in: md4(E) nls_utf8(E) cifs(E) dns_resolver(E) fscache(E) rfcomm(E) xt_multiport(E) iptable_filter(E) ip_tables(E) x_tables(E) pci_stub(E) vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) bnep(E) dm_mod(E) snd_hda_codec_hdmi(E) arc4(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) hmac(E) drbg(E) ansi_cprng(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) iwlmvm(E) glue_helper(E) ablk_helper(E) cryptd(E) mac80211(E) nouveau(E) i915(E) pcspkr(E) serio_raw(E) joydev(E) evdev(E) usblp(E) btusb(E) btrtl(E) mxm_wmi(E) snd_hda_intel(E) ttm(E) snd_hda_codec(E) drm_kms_helper(E) snd_hda_core(E) iwlwifi(E) snd_hwdep(E) drm(E) snd_pcm(E) snd_timer(E) snd(E) cfg80211(E) i2c_algo_bit(E) soundcore(E) i2c_i801(E) mei_me(E) shpchp(E) sg(E) mei(E) wmi(E) hci_uart(E) btbcm(E) btqca(E) btintel(E) 8250_fintek(E) bluetooth(E) battery(E) intel_lpss_acpi(E) rfkill(E) intel_lpss(E) mfd_core(E) video(E) acpi_als(E) kfifo_buf(E) tpm_tis(E) industrialio(E) tpm(E) acpi_pad(E) button(E) processor(E) parport_pc(E) ppdev(E) lp(E) parport(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) uas(E) usb_storage(E) sr_mod(E) cdrom(E) sd_mod(E) hid_generic(E) usbhid(E) crc32c_intel(E) psmouse(E) ahci(E) e1000e(E) libahci(E) ptp(E) xhci_pci(E) pps_core(E) xhci_hcd(E) libata(E) scsi_mod(E) usbcore(E) usb_common(E) fan(E) thermal(E) i2c_hid(E) hid(E) fjes(E)
May  5 22:48:19 otaria kernel: [ 6706.415698] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           OE   4.5.0-1-amd64 #1 Debian 4.5.1-1
May  5 22:48:19 otaria kernel: [ 6706.415698] Hardware name: LENOVO 10FWCTO1WW/SKYBAY, BIOS FWKT38A   01/28/2016
May  5 22:48:19 otaria kernel: [ 6706.415699] task: ffff8808417700c0 ti: ffff880841120000 task.ti: ffff880841120000
May  5 22:48:19 otaria kernel: [ 6706.415700] RIP: 0010:[<ffffffff81482718>]  [<ffffffff81482718>] cpuidle_enter_state+0x118/0x2c0
May  5 22:48:19 otaria kernel: [ 6706.415703] RSP: 0018:ffff880841123eb8  EFLAGS: 00000246
May  5 22:48:19 otaria kernel: [ 6706.415704] RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000018
May  5 22:48:19 otaria kernel: [ 6706.415705] RDX: 001c288ad3fa948e RSI: 00000000004b1da7 RDI: 0000000000000000
May  5 22:48:19 otaria kernel: [ 6706.415705] RBP: 00000616b0534df8 R08: 0000000000000018 R09: ffff880865452ab4
May  5 22:48:19 otaria kernel: [ 6706.415706] R10: 000000000000209a R11: 00000000000008a5 R12: ffffe8ffffc492d0
May  5 22:48:19 otaria kernel: [ 6706.415707] R13: ffffffff81ab5898 R14: 00000616b0375ce0 R15: ffffffff81ab5640
May  5 22:48:19 otaria kernel: [ 6706.415707] FS:  0000000000000000(0000) GS:ffff880865440000(0000) knlGS:0000000000000000
May  5 22:48:19 otaria kernel: [ 6706.415708] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  5 22:48:19 otaria kernel: [ 6706.415709] CR2: 00007f9f20800000 CR3: 0000000001a0b000 CR4: 00000000003406e0
May  5 22:48:19 otaria kernel: [ 6706.415709] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May  5 22:48:19 otaria kernel: [ 6706.415710] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
May  5 22:48:19 otaria kernel: [ 6706.415710] Stack:
May  5 22:48:19 otaria kernel: [ 6706.415711]  0000000005ba63ce ffffe8ffffc492d0 ffff880841124000 ffff880841120000
May  5 22:48:19 otaria kernel: [ 6706.415712]  ffff880841120000 ffff880841124000 ffffffff81ab5640 ffffffff810b8c67
May  5 22:48:19 otaria kernel: [ 6706.415713]  2f95298c05ba63ce 6db3faa90427c13c 0000000000000000 0000000000000000
May  5 22:48:19 otaria kernel: [ 6706.415714] Call Trace:
May  5 22:48:19 otaria kernel: [ 6706.415717]  [<ffffffff810b8c67>] ? cpu_startup_entry+0x287/0x340
May  5 22:48:19 otaria kernel: [ 6706.415719]  [<ffffffff8104d3fa>] ? start_secondary+0x15a/0x190
---<--------------------cut here---------------end--------------------->---

I've searched far and wide but found no solutions, except for one where
the issue was traced to a faulty power supply.  Nothing in these logs
speaks to me.

-- 
Seb


Reply to: