[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Need help debugging total system lockup, probably notebook power saving related



Hello,

I need help debugging random total system lock-ups.
This is a notebook Acer Aspire V3-572G-78A running Debian Stretch with the 4.9.0-5-amd64 kernel.

When running on battery (does not happen on AC power), usually after resuming from RAM, after some rather random time (can be a few minutes to hours) the system suddenly locks up, the screen freezes, keyboard and the click-pad don't react, sound keeps playing a ~2 second loop. The computer does not react to magic SysRq combos (probably because the keyboard doesn't react), or to pressing the power key. I cannot ping it nor ssh into it. The notebook appears to stay in this state indefinitely (the screen does not blank). Only a ~10-sec power-key hold or removing the battery does a hard reset.

I believe this is a kernel-level lock-up in some hardware driver. Unfortunately, I haven't been able to find out which one, because the log files (tried both syslog and journald) contain nothing out of the ordinary just before the lock-up. Probably the IO locks-up as well.

Netconsole isn't really an easy option, because I cannot reliably reproduce this in a suitable controlled environment, which is further complicated by the lack of polling support (required for netconsole) on the wireless interface.

My suspects:
- The integrated Intel graphics card with the i915 driver: always had issues with it (on linux-3.16 it used to crash/hang a lot), maybe the gpu hangs are not properly detected anymore.
- The hard disk sometimes loses APM levels after suspend (have to use pm_async == 0 to prevent errors after each suspend). Maybe this points to a larger suspend/power-mgmt issue.
- My iwlwifi interface sometimes crashes and only removing it from the PCI bus and rescanning for it helps. But this procedure does not hang the whole system.

Any help, suggestions, pointers will be appreciated.

Regards,
Ondrej G.

Reply to: