[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Random hard freezes Wheezy




On 17.05.2014 13:12, Andrei POPESCU wrote:
> Network Manager on a server?

Yes, it's installed by default. Is there some drawbacks of keeping it on server?

> - test your RAM (memtest)

Did it before installing.

> - try a newer kernel (e.g. from backports)
As a last resort. I don't want to touch default kernel if the cause may be other things.

> - check temperatures (CPU, case, etc.)
How can this be done in Debian?

> - check and/or replace if possible the power source of the server
I will do this when all software means are excluded.

> - make sure you have all necessary firmware installed (check output of
>    'dmesg | grep firmware')
[ 7.771848] tg3 0000:04:04.0: firmware: agent loaded tigon/tg3_tso5.bin into memory

> To receive further help from this list you could:
>
> - attach full output of 'dmesg' immediately after a successful boot (run
>    'dmesg > dmesg.txt' and attach 'dmesg.txt')
> - upload somewhere a picture of the screen with the kernel panic

I will do this on next kernel panic.

On 18.05.2014 02:21, Stan Hoeppner wrote:
>> PRIMERGY Econel200/D2020, BIOS 08.10.Rev.1100.2020 06/01/2006
>
> This Fujitsu server is 7-8 years old...

Yes, it is old. Doesn't Debian support it?

> Eth1 link down is likely a symptom, not a cause.  However, it could be a
> cause if the switch port on the other end of the cable is going bad.  In
> that case the switch port could be applying spurious voltage to the
> wire, which could cause this server to lock up.  This is rare but I have
> seen it in the past.  A short in the cable may cause this as well, but
> again this is rare.  Cables are cheap, so replace it just in case.

Today I replaced that netcard with another one. Also I spotted that there were a conflict on this netcard with another server. Why Debian didn't log about conflicting IP with another device? Windows server with which it was in conflict did warned about this, but I didn't look at that server as it was running for years without problems.

> However, the fact that it runs for many hours between lockups suggests
> the cause is a thermal problem.  Check all fans in the system to make
> sure they're spinning at full RPM and are free of dust buildup.  This
> includes the CPU fan(s), chassis fans, and fan inside the PSU.  Given
> the age of this machine, I'd simply replace every fan in it for good
> measure.

Before putting this server in the server room, technicians did a full clean of server's inside from dust. And checked voltages and fans. But if after changing network card the problem will persist. I will follow suggestions of Andrei and Stan.

Thank you.

--
Mimiko desu.


Reply to: