Re: systems hangs every few days
On 2013-06-18 12:38, Karl E. Jorgensen wrote:
> On Tue, Jun 18, 2013 at 01:59:07PM +0100, Chris Purves wrote:
> 
>> After upgrading to wheezy, I get a system hang every one or two days
>> where the system becomes completely unresponsive and I need do a
>> cold boot.
> 
>> This is an older machine with an Athlon processor.  I'm not running
>> X.  I don't see anything unusual in the logs.  The last entry in
>> syslog is typically a cron job, but not always the same one.  The
>> system seems to freeze without any warning.
> 
> When the system freezes, is there anything useful on the console?  If
> the kernel craps out, the result may not be visible in the log
> files (because things can halt before buffers are flushed etc). 
> 
> It may be useful to disable screen blanking on the console for this -
> the kernel may (or may not) wake up the console upon death. (I call
> that the JFK syndrome: He never knew what hit him).
I disabled screen blanking and powersave on the console.  We'll see if anything useful is displayed the next time it goes down.
> A couple of candidates spring to mind:
> 
> * Overheating? If the system is old, it may be full of dust and thus
>   the fans may struggle. Or the bearings get worn out. Insufficient
>   airflow and cooling does tend to make things go pop - except for
>   CPUs which (I believe) shut themselves down due to a built-in
>   self-preservation instinct courtesy of the hardware engineers.
The CPU fan is about fours years old and the CPU temperature hovers around 50-55 C, however, perhaps some other component on the motherboard is getting too hot.  I can check into that.
> * Struggling power supply?  If the power supply is just barely
>   providing enough power, random things which require more power may
>   cause voltage drops that some component take a dislike to.  Although
>   the system *should* be consuming peak amount of power during
>   power-on peaks may also occur later.  
I am suspecting that it may be the power supply.  I replaced it about two years ago, but maybe it's time again.  If I don't get any good information from the console I may try replacing the power supply.
> * Bad RAM? (already covered in a different part of the thread)
> 
> * Bad capacitors?  Older motherboards are more likely to suffer from
>   the capacitors going "pop". A web search for "Capacitor plague" is
>   probably more reliable and informative than I can achieve in this
>   email.
I can take a look for that.
>> I tried downgrading the kernel back to the squeeze version (2.6) and
>> it still locks up.  Before upgrading to wheezy I resized a few of the
>> partitions.  Other than that, nothing else has changed and everything
>> had been running fine for years.
> 
> Assuming that the resize was healthy, all should be ok.
> 
> But... Since there are no clear suspects, paranoia dictates a run of
> fsck on the affected file systems. Just in case. At least it is a
> harmless check if you can afford the downtime while the file systems
> are unmounted.
> 
> Hope this helps
> 
-- 
Chris Purves
Visit my blog: http://chris.northfolk.ca
"Nobody goes there no more; it's too crowded!" - Yogi Berra
Reply to: