[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Distributed monitoring

Jesús M. Navarro wrote:
>> First of all, yes, we do have implemented topology and dependencies, and
>> we do not receive UNKNOWN status already. But this doesn't seem to be
>> enough...
> Then maybe I didn't understand properly your setup and/or needs.  Why are you 
> being flooded and how would you know they are false positives?  "Who" is 
> the "fooled" nagios server and why?

Well, flooded might have been a too strong term. Let's say I am
receiving about 10 alerts a day.

The issue here is that we have our main nagios server in Florida, and it
is supposed to monitor servers far away, with some unreliable links,
like in Malaysia. Also, I just need my upstream provider to "play" a bit
with BGP (to optimize his traffic), and I get a dozen of alerts... This
is exactly the kind of situation that I want to avoid.

>> I guess that the only way we'd have would be to setup 2 nagios in each
>> Xen server location,
> I don't see how two nagios servers at the same location, with the same view 
> (topologically-wise) of the monitored hosts and services would be any less 
> fooled than just one.

It is just that I would need 2 servers to monitor each others. Having
only one server wouldn't be reliable.

>> but that is quite a pain to maintain.
> It doesn't have to be so.  Provided proper config directory layout and wise 
> template usage maintaining two almost identical nagios servers can be 
> (almost) no more pain than maintaining one plus some rsync/puppet/cfengine 
> magics.

Ok, I'll look into it then.

>> I was hoping  
>> for an out-of-the-box magical solution that would be more easy to
>> deploy. If we do choose this way, does it has to be a "full" nagios
>> setup on each location? Or is it a kind of plugin or client?
> For a remote site, specially with unstable network connections, then yes, you 
> should deploy a nagios "satellite" on each location as per the documentantion 
> you already saw.  The only uneeded nagios portion on the remote location 
> would be the web interface.

Ok, got it. That is good, as I didn't want to run a web server there.
What's the minimum footprint, in terms of memory usage, for such
satelite? The smaller the better, of course, but I don't want it to swap

> On the "standard" multisite nagios deployment you would want to disable 
> notifications on the remote sites and use OSCP (obssess on hosts and 
> services), then by means of passive checks it will be the central monitor the 
> one that will rise notifications as needed.  A more cumbersome but more 
> precise deployment would allow notifications on the remotes via some 
> dedicated link (i.e.: a local GSM modem on the remote facilites) and would 
> restrain them on the central server (that's because when lost conection all 
> the central server can do is warning that it cannot connect to the remotes, 
> while the remote satellite still can inform you about the "real" state of the 
> monitored hosts&services). And then, it is still possible to go for a mixed 
> solution, i.e. in order that the remote stations will notify remote people 
> while your central server warns your main team, directly or upon notification 
> scalation (so, a "minor" service will always managed by your local people 
> while critical services are notified to your central team and/or any of those 
> circunstances only scalate once the local teams are given some time to 
> correct problems by themselves).
> But probably, given your constrains, you will have to deploy local monitoring 
> stations on the remote locations no matter which software you use, be it 
> nagios or a different one (of course, deployment details would depend on the 
> specific choosen tool).

Thanks for all the above. This really helps and saves time to have your
view on it. I'll ask the person in charge in our organization to do this
(I try to avoid touching Nagios, I hate it's configuration format that
often leads to errors).

Thomas Goirand

Reply to: