[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug - remote DNS monitoring





On Tue, Aug 30, 2022, 2:13 PM Casey Deccio <casey@deccio.net> wrote:
Hi all,

I am having trouble tracking down a bug in my monitoring setup.  It all happened when I upgraded the monitored host (host B in my example below) to bullseye.  Note that Host A is also running bullseye, but the problem didn't show itself until Host B was upgraded.

Here is the setup:

Host A (monitoring):
Installed: nagios4, nrpe-ng
IP address: 192.0.2.1

Host B (monitored):
Installed: nrpe-ng, monitoring-plugins-standard, bind9-dnsutils
IP address: 192.0.2.2

Host C (monitored through host B):
Installed: bind9
IP address: 192.0.2.3
Configured to answer authoritatively for example.com on port 53.

                 nrpe
            over HTTPs                      DNS
Host A ------------------> Host B -------------> Host C

When you run check_dns by hand on Host B, you don't say who you are logged-in as. That can make a difference. Nagios runs its scripts in a known environment which may be different than you expect.

On Host B, I run the following:
sudo /usr/bin/python3 /usr/sbin/nrpe-ng --debug -f --config /etc/nagios/nrpe-ng.cfg

While that is running, I run the following on Host A:
/usr/lib/nagios/plugins/check_nrpe_ng -H 192.0.2.2 -c check_dns -a example.com 192.0.2.3 0.1 1.0

The result of running the command on Host A is:
DNS CRITICAL - '/usr/bin/nslookup -sil' msg parsing exited with no address

On Host B, I see the following debug output:
200 POST /v1/check/check_dns (192.0.2.1) 78.05ms
Executing: /usr/lib/nagios/plugins/check_dns -H example.com -s 192.0.2.3 -A -w 0.1 -c 1.0

When I run this exact command on Host B, I get:
$ /usr/lib/nagios/plugins/check_dns -H example.com -s 192.0.2.3 -A -w 0.1 -c 1.0
DNS OK: 0.070 seconds response time. example.com returns 192.0.2.10,2001:db8::10|time=0.069825s;0.100000;1.000000;0.000000

Looks good!  When I run nslookup (run by check_dns), it looks good too:
$ /usr/bin/nslookup -sil example.com 192.0.2.3
Server: 192.0.2.3
Address: 192.0.2.3#53

Name: example.com
Address: 192.0.2.10
Name: example.com
Address: 2001:db8::10

After rerunning nrpe-ng with strace -f, I see something:

[pid 1183842] write(2, "nslookup: ./src/unix/core.c:570:"..., 83) = 83
...
[pid 1183841] read(4, "nslookup: ./src/unix/core.c:570:"..., 4096) = 83

So it appears that the nslookup process is reporting an error.  But I cannot reproduce it outside of nrpe-ng.

Any suggestions?

Casey

Reply to: