Re: continuous reboots in a two nodes cluster with heartbeat and pacemaker.

To: stan@hardwarefreak.com
Cc: debian-user <debian-user@lists.debian.org>
Subject: Re: continuous reboots in a two nodes cluster with heartbeat and pacemaker.
From: Mauro <mrsanna1@gmail.com>
Date: Sun, 12 Aug 2012 23:27:56 +0200
Message-id: <[🔎] CAE17a0U8D95r2qZT=rmHWQui7J0=p95vqk9ibkmuPDR6Kzgu=g@mail.gmail.com>
In-reply-to: <[🔎] 5027F87D.2080306@hardwarefreak.com>
References: <[🔎] CAE17a0VzO_Vz1gL1aBJmgQORWrEGoN=nHnx0Hc5NzciNoUgmuw@mail.gmail.com> <[🔎] 502694FF.50207@hardwarefreak.com> <[🔎] CAE17a0X7N3WGOyH=bjtds4K28BYiKoSvpwMY=JQ=3W7MVjNUmg@mail.gmail.com> <[🔎] 5027F87D.2080306@hardwarefreak.com>

On 12 August 2012 20:39, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> On 8/12/2012 4:44 AM, Mauro wrote:
>> On 11 August 2012 19:23, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>>> On 8/11/2012 8:59 AM, Mauro wrote:
>>>> Hello, I'm experiencing continuous reboots of my two nodes in a
>>>> heartbeat+pacemaker cluster.
>>>> Reboots are random, one day they happen one other day not, sometime
>>>> for 7 days they don't happen, sometimes they happen at night.
>>>> They happen at random days and random time.
>>>> Nodes are connected to a Cisco 3570 switch and a SAN storage system.
>>>> Perhaps there is a misconfiguration in the interfaces?
>>>> Here is my interfaces file:
>>> ....
>>>
>>>
>>>> Do you think there are some errors?
>>>
>>> To determine that you need to look at your logs files, not your config
>>> files.  If the nodes are rebooting due to fencing it will be logged
>>> somewhere, as should the underlying network errors that cause the fence
>>> to close.
>>
>> Yes, I look at my logs but the only thing I see is that node 1 fence
>> node 2 or node 2 fence node 1 because one node doesn't see other node,
>> but I don't understard what is the problem, if it is a problem of my
>> NIC or other.
>
> Is there more than one set of these in any dmes files on either host:
>
> Jul 26 00:38:26 [host] kernel: e100 0000:00:0d.0: eth0: NIC Link is Down
> Jul 26 00:38:28 [host] kernel: e100 0000:00:0d.0: eth0: NIC Link is Up
> 100 Mbps Full Duplex

No, any link down in any log file :-(
I really don't understand why the reboots :-(

> If so it may indicate a flaky NIC or switch port, possibly a bad patch
> cable.  Is there a switch between the hosts or a cross over cable?

There is a cisco 3570 switch.

Reply to:

Follow-Ups:
- Re: continuous reboots in a two nodes cluster with heartbeat and pacemaker.
  - From: Stan Hoeppner <stan@hardwarefreak.com>

References:
- continuous reboots in a two nodes cluster with heartbeat and pacemaker.
  - From: Mauro <mrsanna1@gmail.com>
- Re: continuous reboots in a two nodes cluster with heartbeat and pacemaker.
  - From: Stan Hoeppner <stan@hardwarefreak.com>
- Re: continuous reboots in a two nodes cluster with heartbeat and pacemaker.
  - From: Mauro <mrsanna1@gmail.com>
- Re: continuous reboots in a two nodes cluster with heartbeat and pacemaker.
  - From: Stan Hoeppner <stan@hardwarefreak.com>

Prev by Date: Re: strange behavior after reboot, iceweasel locking everything up
Next by Date: 8086:008a wifi vostro v131
Previous by thread: Re: continuous reboots in a two nodes cluster with heartbeat and pacemaker.
Next by thread: Re: continuous reboots in a two nodes cluster with heartbeat and pacemaker.
Index(es):
- Date
- Thread