[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#834098: libc6: name resolution fails for keys.gnupg.net on some machines / networks



On 2016-08-13 16:46:46 +0200, Aurelien Jarno wrote:
> On 2016-08-13 04:23, Vincent Lefevre wrote:
> > I was not suggesting not to return all addresses. But in case of error
> > (which could just be a temporary network error, not necessarily due to
> > a bug in the nameserver, e.g. due to network congestion), if some of
> > the IP addresses are known, they could be made available to the calling
> > application in case they could be useful (e.g. for a connection). If
> > the application wants all the addresses, it can check error conditions
> > as usual.
> 
> The glibc provides getaddrinfo() which is a POSIX interface, also
> described in RFC2553. You can't change it just because you think it's
> better. Alternatively some other resolver libraries might provide the
> behaviour you need. Anyway in both cases it requires some changes on the
> application side too, which is clearly out of scope of this bug report.

It seems that POSIX doesn't specify the answer in the struct addrinfo
in case of error. But anyway, I was thinking more of an alternative
function, which could be more efficient when the goal is to do a
connection, since the applications need to be modified. Since many
applications could benefit from this, having such a function in the
GNU libc may be better than another resolver library.

Now, in the present case of keys.gnupg.net, this may be unnecessary
(see below about the 11/9/5 and the truncation bit).

> Also note that in your case the getaddrinfo() function returns an
> EAI_AGAIN error aka "Temporary failure in name resolution". The
> application (in your case gnupg) can try to handle the failure or at
> least display a better error message than "Host not found" which is
> clearly misleading in that case.

Indeed. Now, with the new gnupg 2.x that has just replaced the old
one in Debian/unstable, resolving seems to be done differently and
I no longer get an error (I've checked that "ping" still fails to
be sure that this wasn't due to something else). So, there's no bug
to report to gnupg. :)

> > And I would say that it could be the opposite. Imagine a host with
> > hundreds of millions of IP addresses...
> 
> I am sure there is a limit somewhere in one of the RFC.

I haven't found a limit (though I didn't check everything).

According to

  http://serverfault.com/questions/652237/whats-the-maximum-number-of-ips-a-dns-a-record-can-have

there isn't a limit, but this doesn't seem to be based on RFC's,
more on testing. With the example, 1000 records are obtained per
TCP query; other records are obtained with additional TCP queries,
but only one more at a time (rotation by 1). Well, this is rather
ugly with this client.

> Anyway if such a DNS entry exists, I don't think returning a failure
> is really a problem.

And this is what the nameserver of our router is doing! Its chosen
limit can appear to be low, but in absence of specification, how
to choose a practical limit? It seems to be rare to have more than
4 A or AAAA records. Even www.google.org has only one. BTW, I'd be
interested in some statistics.

> > > The point is that the local resolver is supposed to be working
> > > correctly.
> > 
> > and the network quality is good, which is not always the case.
> > 
> > > If it doesn't, one can easily setup a local recursive name server
> > > like unbound.
> > 
> > Unfortunately, this is not a general solution due to buggy ISP's.
> > 
> > > > 11:55:59.097743 IP 192.168.0.6.41008 > 192.168.0.1.domain: 60367+ A? keys.gnupg.net. (32)
> > > > 11:55:59.097796 IP 192.168.0.6.41008 > 192.168.0.1.domain: 31606+ AAAA? keys.gnupg.net. (32)
> > > > 11:55:59.098339 IP 192.168.0.6.38010 > 192.168.0.1.domain: 4217+ PTR? 1.0.168.192.in-addr.arpa. (42)
> > > > 11:55:59.143100 IP 192.168.0.1.domain > 192.168.0.6.38010: 4217 NXDomain* 0/1/0 (94)
> > > > 11:55:59.143325 IP 192.168.0.6.43592 > 192.168.0.1.domain: 23396+ PTR? 6.0.168.192.in-addr.arpa. (42)
> > > > 11:55:59.161082 IP 192.168.0.1.domain > 192.168.0.6.41008: 60367 11/9/5 CNAME pool.sks-keyservers.net., A 198.128.3.63, A 93.94.119.246, A 78.46.223.54, A 131.175.15.4, A 151.252.40.184, A 5.9.50.141, A 209.135.211.141, A 5.135.158.148, A 68.187.0.77, A 193.17.17.6 (502)
> > > 
> > > This tcpdump trace doesn't show the answer header, so we don't know if
> > > the truncation flag is set. That said the 11/9/5 says that the answer
> > > contains 11 answer records, 9 name server records and 5 additional
> > > records. This clearly doesn't fit. A normal DNS server would just return
> > > 11 answers, so 11/0/0.
> > > 
> > > That said I just realized that the strace entry in your previous email
> > > contains the beginning of the answer:
> > > 
> > > > 30419 recvfrom(4, "'J\203\200\0\1\0\v\0\10\0\0\4keys\5gnupg\3net\0\0\34\0\1"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.0.1")}, [16]) = 500
> > > 
> > > Converted into hexadecimal, this is:
> > >   27 4a 83 80 00 01 00 0b 00 08 00 00 04 6b 65 79
> > >   73 05 67 6e 75 70 67 03 6e 65 74 00 00 1c 00 01
> > > 
> > > 274a is the identification. The flags are 8380 and corresponds to QR,
> > > TC, RD, RA. Your name server clearly says that the answer is truncated.
> > > On a working nameserver, the flags are 8180 for this query, so the same
> > > without the truncation flag.
> > 
> > I don't understand here. You said above "This clearly doesn't fit.",
> > so that it is normal that the truncation flag is set, isn't it?
> > Or do you mean that the answer should have been 11/0/0, so that
> > the truncation flag wouldn't be set as a consequence?
> 
> Your recursive DNS nameserver got asked to resolve keys.gnupg.net. As
> all A records fit inside the 512 bytes limit, your local name server
> should have return it without truncation, possibly adding additional
> records up to the limit.

OK, but I'm not sure that the truncation flag was a problem here.
See below my remark on the 11/9/5.

> > I wonder which part of the RFC you are talking about.
> 
> The RFC2181 section 9:
> 
> | 9. The TC (truncated) header bit
> | 
> |    The TC bit should be set in responses only when an RRSet is required
                  ^^^^^^
> |    as a part of the response, but could not be included in its entirety.
> |    The TC bit should not be set merely because some extra information
                  ^^^^^^
> |    could have been included, but there was insufficient room.  This
> |    includes the results of additional section processing.  In such cases
> |    the entire RRSet that will not fit in the response should be omitted,
> |    and the reply sent as is, with the TC bit clear.  If the recipient of
> |    the reply needs the omitted data, it can construct a query for that
> |    data and send that separately.
> 
> This is clearly not what your nameserver does.

OK, but that's a "should"! Of course, one can still blame the
nameserver implementation for not following this "should". But one
can also blame the OS / system libraries for not taking into account
that this is just a "should", i.e. one can expect that the system
provides some function that would return the corresponding partial
(possibly complete) data as a fallback.

> |    Where TC is set, the partial RRSet that would not completely fit may
> |    be left in the response.  When a DNS client receives a reply with TC
> |    set, it should ignore that response, and query again, using a
               ^^^^^^
> |    mechanism, such as a TCP connection, that will permit larger replies.
> 
> This is what the GNU libc does.

Again, that's a "should".

But since one got 11/9/5 and one has all the 11 answer records in the
UDP packet, couldn't it be deduced that one got all the IP addresses,
so that a failure in getaddrinfo wasn't necessary?

> > I doubt that GNU libc would make any difference. What matters is
> > how MS-Windows behaves, and probably nowadays Android and iOS too.
> > Also, if there were conformance tests, e.g. from the Linux
> > community, this could help. At least the buyers would have a way
> > to choose, and it could be easier to report issues to the vendors.
> 
> I don't really see why the Linux community should provide a conformance
> tests more than the Windows and Android vendors.

Simply because Windows and Android products work with non-conforming
nameservers, so that they don't really need such tests. However a
Linux user is more interested in knowing whether some product will
work with Linux before buying it.

> > > In short it cleary shows that the problem comes from the name server and
> > > not the GNU libc:
> > > - the nameserver set the truncation bit
> > 
> > Still unclear (see above).
> 
> As said above it doesn't follow RFC2181.

I disagree. It just breaks a "should". Moreover the bad value of the
truncation bit should have been detected due to the fact that one has
all the 11 answer records in the UDP packet.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Reply to: