[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: systems hangs every few days



On Tue, Jun 18, 2013 at 01:59:07PM +0100, Chris Purves wrote:

> After upgrading to wheezy, I get a system hang every one or two days
> where the system becomes completely unresponsive and I need do a
> cold boot.

> This is an older machine with an Athlon processor.  I'm not running
> X.  I don't see anything unusual in the logs.  The last entry in
> syslog is typically a cron job, but not always the same one.  The
> system seems to freeze without any warning.

When the system freezes, is there anything useful on the console?  If
the kernel craps out, the result may not be visible in the log
files (because things can halt before buffers are flushed etc). 

It may be useful to disable screen blanking on the console for this -
the kernel may (or may not) wake up the console upon death. (I call
that the JFK syndrome: He never knew what hit him).

A couple of candidates spring to mind:

* Overheating? If the system is old, it may be full of dust and thus
  the fans may struggle. Or the bearings get worn out. Insufficient
  airflow and cooling does tend to make things go pop - except for
  CPUs which (I believe) shut themselves down due to a built-in
  self-preservation instinct courtesy of the hardware engineers.

* Struggling power supply?  If the power supply is just barely
  providing enough power, random things which require more power may
  cause voltage drops that some component take a dislike to.  Although
  the system *should* be consuming peak amount of power during
  power-on peaks may also occur later.  

* Bad RAM? (already covered in a different part of the thread)

* Bad capacitors?  Older motherboards are more likely to suffer from
  the capacitors going "pop". A web search for "Capacitor plague" is
  probably more reliable and informative than I can achieve in this
  email.

> I tried downgrading the kernel back to the squeeze version (2.6) and
> it still locks up.  Before upgrading to wheezy I resized a few of the
> partitions.  Other than that, nothing else has changed and everything
> had been running fine for years.

Assuming that the resize was healthy, all should be ok.

But... Since there are no clear suspects, paranoia dictates a run of
fsck on the affected file systems. Just in case. At least it is a
harmless check if you can afford the downtime while the file systems
are unmounted.

Hope this helps

-- 
Karl E. Jorgensen


Reply to: