Re: Debugging mysterious freeze / crash

On 23.09.2018 07:51, Celejar wrote:

I've been experiencing a great deal of frustration recently with
intermittent freezes / crashes on my Debian Sid system (a Lenovo
W550s). The symptoms are that the screen totally freezes and the system
becomes completely unresponsive (even ssh attempts from another machine
fail), and the only thing that seems to have any effect is a hard
reboot (holding down the power button until the system restarts).

Upon reboot, I can't find anything at all interesting in 'journalctl -b
-1', or /var/log/syslog - the former just shows everything looking
normal until the moment of the crash, at which point the log just ends,
and the latter also just shows everything seeming to be fine until the
moment of the crash, and then shows the boot messages from the reboot.

Any ideas of what could be causing this, or how I could go about
debugging it? I've been using this machine for years without
experiencing anything like this, and I'm not sure for how long this has
been a problem. I did recently upgrade from stable to unstable, but I'm
not sure whether or not the problem's initial occurences coincide with
the upgrade.


With symptoms like this, I'd suspect hardware problems. If there is something bad happened with some software the kernel would know and warn about it.

Best way to debug it, I think, is to test hardware under load with another fresh OS. Debian stable or Windows trial version would do.
Severe over-heating can cause this. So monitor temperatures of CPU and ICH. Dangerous levels like 80C-90C-100C should be dealt with.
You have to check if battery is ok, PSU performs alright and can deliver power under high-load.
During tests you can apply small amount of vibration to the laptop (with constantly typing on the keyboard, or just tapping on the sides of the laptop) to check if there are some solder joints become crackled and could loose contact temporarily.
Do all tests sequentially, to be able to find root cause for the freezes. If you got few lockups while tapping, you have faulty motherboard that should be serviced or replaced.

