[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: continuous reboots in a two nodes cluster with heartbeat and pacemaker.



On 12 August 2012 20:39, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> On 8/12/2012 4:44 AM, Mauro wrote:
>> On 11 August 2012 19:23, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>>> On 8/11/2012 8:59 AM, Mauro wrote:
>>>> Hello, I'm experiencing continuous reboots of my two nodes in a
>>>> heartbeat+pacemaker cluster.
>>>> Reboots are random, one day they happen one other day not, sometime
>>>> for 7 days they don't happen, sometimes they happen at night.
>>>> They happen at random days and random time.
>>>> Nodes are connected to a Cisco 3570 switch and a SAN storage system.
>>>> Perhaps there is a misconfiguration in the interfaces?
>>>> Here is my interfaces file:
>>> ....
>>>
>>>
>>>> Do you think there are some errors?
>>>
>>> To determine that you need to look at your logs files, not your config
>>> files.  If the nodes are rebooting due to fencing it will be logged
>>> somewhere, as should the underlying network errors that cause the fence
>>> to close.
>>
>> Yes, I look at my logs but the only thing I see is that node 1 fence
>> node 2 or node 2 fence node 1 because one node doesn't see other node,
>> but I don't understard what is the problem, if it is a problem of my
>> NIC or other.
>
> Is there more than one set of these in any dmes files on either host:
>
> Jul 26 00:38:26 [host] kernel: e100 0000:00:0d.0: eth0: NIC Link is Down
> Jul 26 00:38:28 [host] kernel: e100 0000:00:0d.0: eth0: NIC Link is Up
> 100 Mbps Full Duplex

No, any link down in any log file :-(
I really don't understand why the reboots :-(

> If so it may indicate a flaky NIC or switch port, possibly a bad patch
> cable.  Is there a switch between the hosts or a cross over cable?

There is a cisco 3570 switch.


Reply to: