Re: Distributed monitoring
On Saturday 28 March 2009 18:09:46 Thomas Goirand wrote:
> Hi *,
> I've read it about distributed monitoring, and that Nagios could do it.
> Then I have see this:
> and was disappointed. This is quite not what I want. Let me explain.
[you are flooded by false-positive alerts]
You should really take more time to learn the ins and outs of Nagios. Probably
all of them can be avoided with the topology suggested on the docs.
You need to study about host parentship, service dependencies, state change
notifications and contact definitions.
What you want to do is having alerts on a 24x7 basis but...
Express host parentship. This way in a topology such as A->B->C where A is
you nagios central server, C the monitored host and B some intermediate (say,
a Nagios satellite or a router) if B fails, A will know that doesn't mean C
is failing but only UNKNOWN.
Express service depencies. Say you are monitoring some internals on a remote
host by means of NRPE. With proper service dependencies in place if the
remote NRPE daemon dies Nagios will know that doesn't mean the dependant
services are failing and will mark them properly as UNKNOWN.
Declare proper notification options on your contacts. Given the above you
don't want to be notified by SMS on UNKNOWN status, only on properly detected
CRITICAL or RECOVERY states; then define a contact that will only be notified
as "host_notification_options d,r" and "service_notification_options c,r"
(where d==DOWN, C==CRITICAL and R==RECOVER).
Remember that the nearer the nagios monitor node (be it a central server or a
local satellite) to the tested hosts and services the better results you will
get avoiding false positives and negatives.
All in all you probably will recieve more and more proper feedback on the
nagios users maillist than this one. Other people manage to have Nagios
deployed multisite over bad quality links and found the ways not to be
flooded with alerts in the middle of the night (me, for one).