[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Random hard freezes Wheezy



On 5/17/2014 2:43 AM, Mimiko wrote:

> On one server I intermittently encounter hard freezes. Server does not
> react, ping, or ctrl+alt+del. Just caps lock and num lock flashes. Only
> a power off from button helps to start server, after which it runs until
> again this happens.

Flashing KB LEDs is typically a sign of hardware failure, either
permanent or temporary.  Usually this is a symptom, not a cause, but
check the PS2 connection to the motherboard just in case.  Make sure it
is secure, not loose or wobbly, and make sure none of the 6 pins are
bent or broken.

...
> PRIMERGY Econel200/D2020, BIOS 08.10.Rev.1100.2020 06/01/2006

This Fujitsu server is 7-8 years old...

> What could be the problem of this?

Eth1 link down is likely a symptom, not a cause.  However, it could be a
cause if the switch port on the other end of the cable is going bad.  In
that case the switch port could be applying spurious voltage to the
wire, which could cause this server to lock up.  This is rare but I have
seen it in the past.  A short in the cable may cause this as well, but
again this is rare.  Cables are cheap, so replace it just in case.

However, the fact that it runs for many hours between lockups suggests
the cause is a thermal problem.  Check all fans in the system to make
sure they're spinning at full RPM and are free of dust buildup.  This
includes the CPU fan(s), chassis fans, and fan inside the PSU.  Given
the age of this machine, I'd simply replace every fan in it for good
measure.

If all the fans are clean and spinning at full RPM, the next likely
cause is a bad PSU.  Check the output voltage of all PSU rails with a
voltmeter to ensure they are within specification.  If you do not have
the test equipment or if this is beyond your ability, take the machine
to a qualified repair shop and have them check it.  Or, as PSUs are
relatively inexpensive, simply replace the PSU due to age as well as the
lockup issue.  The host name "srv75" suggests a server farm.  If you are
tasked with maintaining a farm, I'd assume you have the requisite
hardware background to perform this testing, troubleshooting, and repair.

Cheers,

Stan


Reply to: