[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Distributed monitoring

Hi, Thomas:

On Sunday 29 March 2009 01:09:01 Thomas Goirand wrote:
> Jesús M. Navarro wrote:
> > On Saturday 28 March 2009 18:09:46 Thomas Goirand wrote:
> >> Hi *,
> >>
> >> I've read it about distributed monitoring, and that Nagios could do it.
> >> Then I have see this:
> >>
> >> http://nagios.sourceforge.net/docs/1_0/images/distributed.png
> >>
> >> and was disappointed. This is quite not what I want. Let me explain.
> >
> > [you are flooded by false-positive alerts]
> >
> > You should really take more time to learn the ins and outs of Nagios.
> > Probably all of them can be avoided with the topology suggested on the
> > docs.
> >
> > You need to study about host parentship, service dependencies, state
> > change notifications and contact definitions.

> First of all, yes, we do have implemented topology and dependencies, and
> we do not receive UNKNOWN status already. But this doesn't seem to be
> enough...

Then maybe I didn't understand properly your setup and/or needs.  Why are you 
being flooded and how would you know they are false positives?  "Who" is 
the "fooled" nagios server and why?

> I guess that the only way we'd have would be to setup 2 nagios in each
> Xen server location,

I don't see how two nagios servers at the same location, with the same view 
(topologically-wise) of the monitored hosts and services would be any less 
fooled than just one.

> but that is quite a pain to maintain.

It doesn't have to be so.  Provided proper config directory layout and wise 
template usage maintaining two almost identical nagios servers can be 
(almost) no more pain than maintaining one plus some rsync/puppet/cfengine 

> I was hoping  
> for an out-of-the-box magical solution that would be more easy to
> deploy. If we do choose this way, does it has to be a "full" nagios
> setup on each location? Or is it a kind of plugin or client?

For a remote site, specially with unstable network connections, then yes, you 
should deploy a nagios "satellite" on each location as per the documentantion 
you already saw.  The only uneeded nagios portion on the remote location 
would be the web interface.

On the "standard" multisite nagios deployment you would want to disable 
notifications on the remote sites and use OSCP (obssess on hosts and 
services), then by means of passive checks it will be the central monitor the 
one that will rise notifications as needed.  A more cumbersome but more 
precise deployment would allow notifications on the remotes via some 
dedicated link (i.e.: a local GSM modem on the remote facilites) and would 
restrain them on the central server (that's because when lost conection all 
the central server can do is warning that it cannot connect to the remotes, 
while the remote satellite still can inform you about the "real" state of the 
monitored hosts&services).  And then, it is still possible to go for a mixed 
solution, i.e. in order that the remote stations will notify remote people 
while your central server warns your main team, directly or upon notification 
scalation (so, a "minor" service will always managed by your local people 
while critical services are notified to your central team and/or any of those 
circunstances only scalate once the local teams are given some time to 
correct problems by themselves).

But probably, given your constrains, you will have to deploy local monitoring 
stations on the remote locations no matter which software you use, be it 
nagios or a different one (of course, deployment details would depend on the 
specific choosen tool).

Reply to: