Re: Distributed location server monitoring
Hello,
Yes, Nagios does distributed monitoring:
http://nagios.sourceforge.net/docs/2_0/distributed.html
However, the problem you're describing doesn't seem to be related to the
number of Nagios servers that you're using and adding more servers may
only add unnecessary complexity. Make sure that you have the upstream
hops defined as being monitored in Nagios *and* marked as parents of the
servers that you're monitoring. Then if one of those upstream hops goes
down, don't notify on it. This of course assumes that you're sure that
if the upstreams go down that it doesn't affect the connectivity of the
server being monitored. Alternately, tweak the flapping or volatility
of the hops in between the monitor and the server being monitored.
There is a reason why Nagios is reporting on those hops being down, so
you might want to look at why things are being reported as down. If
Nagios sends a notification then that means that the service has been
down for several successive checks/minutes, which is fairly uncommon
unless there really is a problem. It's not a 'false positive' from the
Nagios server's view, so jump on the server and try to replicate the
problem that Nagios is reporting. If you need to, tweak the number of
failed checks before notification and again, getting the parent/child
relationships of the monitored services configured will help.
Just on the basis of the limited information given in your e-mail it
sounds like you need to tune the Nagios configs to your environment to
reduce the false positives rather than adding more monitoring servers.
Once you have the configs fairly tuned then you can think about creating
multiple monitoring points.
Steve Suehring
http://www.braingia.org
On Sat, May 10, 2008 at 03:33:09PM +0800, Thomas Goirand wrote:
> Hi,
>
> We use Nagios internally to monitor about 50 servers. The biggest
> problem that we have is that it sends lot's of false positive because it
> monitors more the connections between one point to another instead of
> the real services that have to be up. The rate of false positive is just
> too high, so it's kind of unusable. We ignore too many warnings, and I'm
> sure it will end up with something really down and we wont check for it.
>
> Is there a distributed kind-of nagios system that would use multiple
> nodes to check, and if (and ONLY if) all contactable monitoring servers
> report a problem, then we receive an alert ?
>
> Thomas
>
> P.S: We don't want to have multiple points where to setup monitoring,
> that would be head hakes...
>
>
> --
> To UNSUBSCRIBE, email to debian-isp-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Reply to: