Re: Distributed monitoring

To: debian-isp@lists.debian.org
Subject: Re: Distributed monitoring
From: "Jesús M. Navarro" <jesus.navarro@undominio.net>
Date: Sat, 28 Mar 2009 22:28:07 +0100
Message-id: <[🔎] 200903282228.07607.jesus.navarro@undominio.net>
In-reply-to: <[🔎] 49CE59DA.5040505@goirand.fr>
References: <[🔎] 49CE59DA.5040505@goirand.fr>

On Saturday 28 March 2009 18:09:46 Thomas Goirand wrote:
> Hi *,
>
> I've read it about distributed monitoring, and that Nagios could do it.
> Then I have see this:
>
> http://nagios.sourceforge.net/docs/1_0/images/distributed.png
>
> and was disappointed. This is quite not what I want. Let me explain.

[you are flooded by false-positive alerts]

You should really take more time to learn the ins and outs of Nagios. Probably 
all of them can be avoided with the topology suggested on the docs.

You need to study about host parentship, service dependencies, state change 
notifications and contact definitions.

What you want to do is having alerts on a 24x7 basis but...

Express host parentship.  This way in a topology such as A->B->C where A is 
you nagios central server, C the monitored host and B some intermediate (say, 
a Nagios satellite or a router) if B fails, A will know that doesn't mean C 
is failing but only UNKNOWN.

Express service depencies.  Say you are monitoring some internals on a remote 
host by means of NRPE.  With proper service dependencies in place if the 
remote NRPE daemon dies Nagios will know that doesn't mean the dependant 
services are failing and will mark them properly as UNKNOWN.

Declare proper notification options on your contacts.  Given the above you 
don't want to be notified by SMS on UNKNOWN status, only on properly detected 
CRITICAL or RECOVERY states; then define a contact that will only be notified 
as "host_notification_options d,r" and "service_notification_options c,r" 
(where d==DOWN, C==CRITICAL and R==RECOVER).

Remember that the nearer the nagios monitor node (be it a central server or a 
local satellite) to the tested hosts and services the better results you will 
get avoiding false positives and negatives.

All in all you probably will recieve more and more proper feedback on the 
nagios users maillist than this one.  Other people manage to have Nagios 
deployed multisite over bad quality links and found the ways not to be 
flooded with alerts in the middle of the night (me, for one).
-- 
SALUD,
Jesús
***
jesus.navarro@undominio.net
***

Reply to:

Follow-Ups:
- Re: Distributed monitoring
  - From: Thomas Goirand <thomas@goirand.fr>

References:
- Distributed monitoring
  - From: Thomas Goirand <thomas@goirand.fr>

Prev by Date: Yuliadri Aad invites you to connect
Next by Date: Re: Distributed monitoring
Previous by thread: Re: Distributed monitoring
Next by thread: Re: Distributed monitoring
Index(es):
- Date
- Thread