[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Distributed monitoring



Hi *,

I've read it about distributed monitoring, and that Nagios could do it.
Then I have see this:

http://nagios.sourceforge.net/docs/1_0/images/distributed.png

and was disappointed. This is quite not what I want. Let me explain.

We have more than 10 points of presence and, because of many reasons, it
happens that a given link between 2 end points is broken. We have some
points of presence in places like Malaysia where connectivity is far
from being perfect (and there is nothing that can be done for this,
unfortunately). What we want to monitor is if a server is having a
serious issue, not if the network is down (in fact, we want to monitor
the connectivity as well, but that is another issue, and not my concern
in this message).

The issue here is that we receive so many monitoring alerts that it
becomes useless. It happened once already that a server really had an
issue, and because of the flood of alerts, we really realized it was
down a bit late (45 minutes to 1 hour of down time, which is already
unacceptable when only a quick reboot using our remote tools was enough
to solve the issue...). Also, because of the number of alerts and the
fact they are unreliable (many false positive), we can't use our email
to SMS gateway to send us alerts.

So what I wanted to have is something where multiple nagios server (or
another product) would check if a given server is down, and if BOTH are
reporting failure, then an alert is triggered. We could set up to let's
say 5 nagios server or something, distributed in different locations.

What are the options here? Can Nagios do what we need, or should we
search for another solution? Has any of you got some pointers (eg: URL)
to help us here? Thank for anyone who will help.

Thomas


Reply to: