Re: systems hangs every few days
On 2013-06-18 12:38, Karl E. Jorgensen wrote:
> On Tue, Jun 18, 2013 at 01:59:07PM +0100, Chris Purves wrote:
>
>> After upgrading to wheezy, I get a system hang every one or two days
>> where the system becomes completely unresponsive and I need do a
>> cold boot.
>
>> This is an older machine with an Athlon processor. I'm not running
>> X. I don't see anything unusual in the logs. The last entry in
>> syslog is typically a cron job, but not always the same one. The
>> system seems to freeze without any warning.
>
> When the system freezes, is there anything useful on the console? If
> the kernel craps out, the result may not be visible in the log
> files (because things can halt before buffers are flushed etc).
>
> It may be useful to disable screen blanking on the console for this -
> the kernel may (or may not) wake up the console upon death. (I call
> that the JFK syndrome: He never knew what hit him).
I disabled screen blanking and powersave on the console. We'll see if anything useful is displayed the next time it goes down.
> A couple of candidates spring to mind:
>
> * Overheating? If the system is old, it may be full of dust and thus
> the fans may struggle. Or the bearings get worn out. Insufficient
> airflow and cooling does tend to make things go pop - except for
> CPUs which (I believe) shut themselves down due to a built-in
> self-preservation instinct courtesy of the hardware engineers.
The CPU fan is about fours years old and the CPU temperature hovers around 50-55 C, however, perhaps some other component on the motherboard is getting too hot. I can check into that.
> * Struggling power supply? If the power supply is just barely
> providing enough power, random things which require more power may
> cause voltage drops that some component take a dislike to. Although
> the system *should* be consuming peak amount of power during
> power-on peaks may also occur later.
I am suspecting that it may be the power supply. I replaced it about two years ago, but maybe it's time again. If I don't get any good information from the console I may try replacing the power supply.
> * Bad RAM? (already covered in a different part of the thread)
>
> * Bad capacitors? Older motherboards are more likely to suffer from
> the capacitors going "pop". A web search for "Capacitor plague" is
> probably more reliable and informative than I can achieve in this
> email.
I can take a look for that.
>> I tried downgrading the kernel back to the squeeze version (2.6) and
>> it still locks up. Before upgrading to wheezy I resized a few of the
>> partitions. Other than that, nothing else has changed and everything
>> had been running fine for years.
>
> Assuming that the resize was healthy, all should be ok.
>
> But... Since there are no clear suspects, paranoia dictates a run of
> fsck on the affected file systems. Just in case. At least it is a
> harmless check if you can afford the downtime while the file systems
> are unmounted.
>
> Hope this helps
>
--
Chris Purves
Visit my blog: http://chris.northfolk.ca
"Nobody goes there no more; it's too crowded!" - Yogi Berra
Reply to: