[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: continuous reboots in a two nodes cluster with heartbeat and pacemaker.



On 17 August 2012 09:53, Stan Hoeppner <stan@hardwarefreak.com> wrote:


> I'd be thoroughly inspecting the power circuits feeding those servers at
> this point.  Do you have the machines set to automatically power back on
> after power loss?  If you do, switch that mode so they stay off after AC
> power loss.  That should confirm whether the problem is total loss of AC
> voltage or a severely deep sag.

Is that setting in the bios?


> If the problem is a less severe sag, however, this test won't isolate
> the problem.  For that you must dig into the UPS monitoring interface.
> If you don't have a UPS, you'll have to put a tap on the AC circuit and
> monitor the voltage.  This will require specialized equipment, as it
> must be able to log the sag.  Some of the nicer Fluke meters can log the
> lowest voltage, but probably can't tell you the time of day when the sag
> occurs.  Thus, you'll need to highly trained electrician with the proper
> equipment.
>
> This could also be a thermal issue.  Do you have hardware monitoring
> installed and properly configured?  The 'sensors' package?  Over temp
> conditions will often cause random reboots.  Do the boxes have plenty of
> zero restriction cool airflow?  Less than 25 Celsius intake air temperature?

I have others HP server of the same type, some with linux and others
with windows.
Thay are all in the same room so if it is a temperature problem I
think that also other servers can have the same problem but it is not
the case.
Only mine reboots.

>
> The odds of having defective hardware in two HP servers causing random
> reboots in both machines is extremely low, though possible.  If this is
> the case it's a design flaw, not simply two defective parts.
>
> It's also possible you have the wrong memory installed.  Can you provide
> the specs on all DIMMs installed in both machines?  Did all of the
> memory come preinstalled from HP?  Is it HP memory or aftermarket memory
> from Kingston, Crucial, etc?

I've upgraded ram from 32 to 64G.
I've reinstalled all simms.
The bios reports no ram problems.
Also other server are upgraded to 64G.
Reboots are sometime on node1 and sometime on node2.
Bugged ram is on both servers? Strange to me.
Other server don't reboot.


Reply to: