Re: Distributed monitoring
On Sunday 29 March 2009 01:09:01 Thomas Goirand wrote:
> Jesús M. Navarro wrote:
> > On Saturday 28 March 2009 18:09:46 Thomas Goirand wrote:
> >> Hi *,
> >> I've read it about distributed monitoring, and that Nagios could do it.
> >> Then I have see this:
> >> http://nagios.sourceforge.net/docs/1_0/images/distributed.png
> >> and was disappointed. This is quite not what I want. Let me explain.
> > [you are flooded by false-positive alerts]
> > You should really take more time to learn the ins and outs of Nagios.
> > Probably all of them can be avoided with the topology suggested on the
> > docs.
> > You need to study about host parentship, service dependencies, state
> > change notifications and contact definitions.
> First of all, yes, we do have implemented topology and dependencies, and
> we do not receive UNKNOWN status already. But this doesn't seem to be
Then maybe I didn't understand properly your setup and/or needs. Why are you
being flooded and how would you know they are false positives? "Who" is
the "fooled" nagios server and why?
> I guess that the only way we'd have would be to setup 2 nagios in each
> Xen server location,
I don't see how two nagios servers at the same location, with the same view
(topologically-wise) of the monitored hosts and services would be any less
fooled than just one.
> but that is quite a pain to maintain.
It doesn't have to be so. Provided proper config directory layout and wise
template usage maintaining two almost identical nagios servers can be
(almost) no more pain than maintaining one plus some rsync/puppet/cfengine
> I was hoping
> for an out-of-the-box magical solution that would be more easy to
> deploy. If we do choose this way, does it has to be a "full" nagios
> setup on each location? Or is it a kind of plugin or client?
For a remote site, specially with unstable network connections, then yes, you
should deploy a nagios "satellite" on each location as per the documentantion
you already saw. The only uneeded nagios portion on the remote location
would be the web interface.
On the "standard" multisite nagios deployment you would want to disable
notifications on the remote sites and use OSCP (obssess on hosts and
services), then by means of passive checks it will be the central monitor the
one that will rise notifications as needed. A more cumbersome but more
precise deployment would allow notifications on the remotes via some
dedicated link (i.e.: a local GSM modem on the remote facilites) and would
restrain them on the central server (that's because when lost conection all
the central server can do is warning that it cannot connect to the remotes,
while the remote satellite still can inform you about the "real" state of the
monitored hosts&services). And then, it is still possible to go for a mixed
solution, i.e. in order that the remote stations will notify remote people
while your central server warns your main team, directly or upon notification
scalation (so, a "minor" service will always managed by your local people
while critical services are notified to your central team and/or any of those
circunstances only scalate once the local teams are given some time to
correct problems by themselves).
But probably, given your constrains, you will have to deploy local monitoring
stations on the remote locations no matter which software you use, be it
nagios or a different one (of course, deployment details would depend on the
specific choosen tool).