[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: boot ordering and resolvconf



Sorry for the noise,

I finally verified some more claims and discovered that my other recent
mail is wrong in multiple places. I guess this discussion really is
moot. 

So what happens when you getaddrinfo ...

... a non-existent name? -> EAI_SYSTEM ENOENT
... something when nothing is bound to port 53?
    -> EAI_SYSTEM ECONNREFUSED
... something when port 53 is bound, but does not answer?
    -> EAI_SYSTEM ETIMEDOUT

Contrary to what I said earlier it does not make a difference whether
resolv.conf lists 127.0.0.1 or is empty in all three cases.

So my other mail claiming to add 127.0.0.1 to resolv.conf when there is
no nameserver is just wrong.

On Sun, Jun 30, 2013 at 10:40:03PM +0200, Thomas Hood wrote:
> A problem is that getaddrinfo() doesn't distinguish in its return                                                                                                                      
> status between "couldn't reach a nameserver" and "nameserver                                                                                                                           
> says the name doesn't exist".  Either way it returns EAI_SYSTEM                                                                                                                        
> with errno=2 (ENOENT).                                                                                                                                                                 

I cannot confirm this observation and assume it to be wrong. For clarity
I include my own method[1]. I therefore withdraw my comment about
EAI_SYSTEM appearing like a bad design. I just didn't understand it.

On Fri, Jul 05, 2013 at 07:21:12AM +0200, Thomas Hood wrote:
> The only underlying problem I know of is that ntpd is broken (#683061).
> Other applications work fine in mode B.  And the reason that ntpd
> doesn't work properly is NOT that resolv.conf is dynamic. Ntpd doesn't
> work properly because it treats any name resolution failure as
> equivalent to "host does not exist" and treats "host does not exist" as
> a fatal error.  So ntpd will also fail in mode A unless you manage to get
> name service fully operational before ntpd starts.  Merely pointing
> /etc/resolv.conf at a local nameserver does not rule out the possibility
> of name service failures; it just ensures that applications can access
> some nameserver once the local nameserver has started.  That's nice
> but it doesn't fix ntpd. Lookups will still fail, e.g., if ntpd starts before the
> nameserver or if the nameserver doesn't have a forwarding address yet.
> 
> Applications have to be able to deal with temporary name service failures.

Given the above, I fully agree.

> Helmut Grohne wrote:
> > Usually any program reads
> > /etc/resolv.conf once on the first DNS lookup. So all daemons started
> > before the local DNS cache will either use a different server, or fail
> > DNS resolution in all cases. A minority of services (avahi-daemon,
> > fetchmail, postfix, sendmail, squid, and squid3) hook into resolvconf to
> > reload their daemons when /etc/resolv.conf is changed by resolvconf.
> > These daemons will not be affected by this problem. Many other services
> > on the other hand will.
> 
> This is an as-yet unsubstantiated claim. Libc resolver clients generally
> re-read resolv.conf every time it changes. Are there known examples
> of programs that fail to do so?

This aspect of my initial claim was wrong. I was not aware that libc
always stats resolv.conf. Thanks for pointing out.

Given that systemd will bind 127.0.0.1:53 and the libc uses 127.0.0.1 in
the presence of an empty resolv.conf, I see that daemons will have no
issues unless they also fail on an unresponsive name server during normal
operation. There is no issue.

Revisiting the two issues I stated in my initial mail:
| 1) If /etc/resolv.conf is initially empty, getaddrinfo returns
|    EAI_SYSTEM. Some daemons (e.g. ntp) currently treat this as a
|    permanent error. This usually results in a user visible issue. It is
|    nasty to track down, because it is usually unreproducible once the
|    system is started (i.e. a simple daemon restart will fix it).

The specific meaning of EAI_SYSTEM is defined by errno and ntp was not
handling ECONNREFUSED. Maybe it can be found in other daemons, but
systemd will not exhibit this, because there you will only see
ETIMEDOUT. The issue is not general. If it can be found anywhere, then
in services under sysv using $named without declaring an ordering
requirement on it.

| 2) If /etc/resolv.conf initially contains some other DNS server (e.g.
|    from dhcp or from /etc/network/interfaces), some services will bypass
|    the local DNS cache resulting in a higher load on remote DNS servers.
|    This problem in general is mostly invisible.

Fixed in libc by stating resolv.conf.

So we basically are in mode B and it works most of the time.

Thanks for clearing things up. I guess this is EOD.

Helmut

[1] code below
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <stdio.h>
#include <errno.h>
int main(int argc, char **argv) {
	int ret; struct addrinfo *x;
	ret = getaddrinfo(argv[1], NULL, NULL, &x);
	printf("ret: %d, errno: %d\n", ret, errno);
	return 0;
}
Unresponsiveness was simulated with
iptables -I INPUT 1 -p udp --dport 53 -j DROP


Reply to: