[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#834098: libc6: name resolution fails for keys.gnupg.net on some machines / networks



On 2016-08-13 04:23, Vincent Lefevre wrote:
> On 2016-08-12 23:24:29 +0200, Aurelien Jarno wrote:
> > On 2016-08-12 12:15, Vincent Lefevre wrote:
> > > According to tcpdump output below, there is no truncation: the number
> > > of A's and AAAA's (10 for each) match what "host keys.gnupg.net"
> > > gives. BTW, even if there were a truncation, there shouldn't be a
> > > failure: using of the returned IP addresses would be sufficient to
> > > connect.
> > 
> > That a wrong assumption. The libc getaddrinfo interface is not to
> > connect to an IP, but rather to return *all* addresses corresponding to
> > a query. The returned IPs are not necessarily used for a connection
> > later. 
> 
> I was not suggesting not to return all addresses. But in case of error
> (which could just be a temporary network error, not necessarily due to
> a bug in the nameserver, e.g. due to network congestion), if some of
> the IP addresses are known, they could be made available to the calling
> application in case they could be useful (e.g. for a connection). If
> the application wants all the addresses, it can check error conditions
> as usual.

The glibc provides getaddrinfo() which is a POSIX interface, also
described in RFC2553. You can't change it just because you think it's
better. Alternatively some other resolver libraries might provide the
behaviour you need. Anyway in both cases it requires some changes on the
application side too, which is clearly out of scope of this bug report.

Also note that in your case the getaddrinfo() function returns an
EAI_AGAIN error aka "Temporary failure in name resolution". The
application (in your case gnupg) can try to handle the failure or at
least display a better error message than "Host not found" which is
clearly misleading in that case.

> > Not returning all addresses so might lead to data loss or
> > security issue.
> 
> Well, an application should not base its security on the nameserver.
> It is well-known that nameservers can return fake answers.

The local recursive nameserver is by definition trusted. If additional
security is required, DNSSEC can be used.

> And I would say that it could be the opposite. Imagine a host with
> hundreds of millions of IP addresses...

I am sure there is a limit somewhere in one of the RFC. Anyway if such
a DNS entry exists, I don't think returning a failure is really a
problem.

> > The point is that the local resolver is supposed to be working
> > correctly.
> 
> and the network quality is good, which is not always the case.
> 
> > If it doesn't, one can easily setup a local recursive name server
> > like unbound.
> 
> Unfortunately, this is not a general solution due to buggy ISP's.
> 
> > > 11:55:59.097743 IP 192.168.0.6.41008 > 192.168.0.1.domain: 60367+ A? keys.gnupg.net. (32)
> > > 11:55:59.097796 IP 192.168.0.6.41008 > 192.168.0.1.domain: 31606+ AAAA? keys.gnupg.net. (32)
> > > 11:55:59.098339 IP 192.168.0.6.38010 > 192.168.0.1.domain: 4217+ PTR? 1.0.168.192.in-addr.arpa. (42)
> > > 11:55:59.143100 IP 192.168.0.1.domain > 192.168.0.6.38010: 4217 NXDomain* 0/1/0 (94)
> > > 11:55:59.143325 IP 192.168.0.6.43592 > 192.168.0.1.domain: 23396+ PTR? 6.0.168.192.in-addr.arpa. (42)
> > > 11:55:59.161082 IP 192.168.0.1.domain > 192.168.0.6.41008: 60367 11/9/5 CNAME pool.sks-keyservers.net., A 198.128.3.63, A 93.94.119.246, A 78.46.223.54, A 131.175.15.4, A 151.252.40.184, A 5.9.50.141, A 209.135.211.141, A 5.135.158.148, A 68.187.0.77, A 193.17.17.6 (502)
> > 
> > This tcpdump trace doesn't show the answer header, so we don't know if
> > the truncation flag is set. That said the 11/9/5 says that the answer
> > contains 11 answer records, 9 name server records and 5 additional
> > records. This clearly doesn't fit. A normal DNS server would just return
> > 11 answers, so 11/0/0.
> > 
> > That said I just realized that the strace entry in your previous email
> > contains the beginning of the answer:
> > 
> > > 30419 recvfrom(4, "'J\203\200\0\1\0\v\0\10\0\0\4keys\5gnupg\3net\0\0\34\0\1"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.0.1")}, [16]) = 500
> > 
> > Converted into hexadecimal, this is:
> >   27 4a 83 80 00 01 00 0b 00 08 00 00 04 6b 65 79
> >   73 05 67 6e 75 70 67 03 6e 65 74 00 00 1c 00 01
> > 
> > 274a is the identification. The flags are 8380 and corresponds to QR,
> > TC, RD, RA. Your name server clearly says that the answer is truncated.
> > On a working nameserver, the flags are 8180 for this query, so the same
> > without the truncation flag.
> 
> I don't understand here. You said above "This clearly doesn't fit.",
> so that it is normal that the truncation flag is set, isn't it?
> Or do you mean that the answer should have been 11/0/0, so that
> the truncation flag wouldn't be set as a consequence?

Your recursive DNS nameserver got asked to resolve keys.gnupg.net. As
all A records fit inside the 512 bytes limit, your local name server
should have return it without truncation, possibly adding additional
records up to the limit.

> > Even if it is a quite standard setup, you have to admit it doesn't
> > behave according to the RFC.
> 
> I wonder which part of the RFC you are talking about.

The RFC2181 section 9:

| 9. The TC (truncated) header bit
| 
|    The TC bit should be set in responses only when an RRSet is required
|    as a part of the response, but could not be included in its entirety.
|    The TC bit should not be set merely because some extra information
|    could have been included, but there was insufficient room.  This
|    includes the results of additional section processing.  In such cases
|    the entire RRSet that will not fit in the response should be omitted,
|    and the reply sent as is, with the TC bit clear.  If the recipient of
|    the reply needs the omitted data, it can construct a query for that
|    data and send that separately.

This is clearly not what your nameserver does.

|    Where TC is set, the partial RRSet that would not completely fit may
|    be left in the response.  When a DNS client receives a reply with TC
|    set, it should ignore that response, and query again, using a
|    mechanism, such as a TCP connection, that will permit larger replies.

This is what the GNU libc does.
 
> > You should complain to the manufacturer and try to get a firmware
> > update.
> 
> I'll see what I can do.
> 
> > Trying to workaround things on the libc side just gives even less value
> > to the RFCs, and encourage selling broken hardware.
> 
> I doubt that GNU libc would make any difference. What matters is
> how MS-Windows behaves, and probably nowadays Android and iOS too.
> Also, if there were conformance tests, e.g. from the Linux
> community, this could help. At least the buyers would have a way
> to choose, and it could be easier to report issues to the vendors.

I don't really see why the Linux community should provide a conformance
tests more than the Windows and Android vendors.

> > > FYI, I also often get 5-second timeouts in name resolution whatever
> > > the host (you can see it above): I get the answer for A or AAAA, but
> > > sometimes, the other answer is lost. I have a DHCP hook that tests
> > > whether I'm using this router:
> > > 
> > > [...]
> > >   ping -n -c 1 -I "$interface" "$new_routers" > /dev/null
> > >   if grep -i -q $mac /proc/net/arp; then
> > >     logger "Google Public DNS with TCP to avoid recurrent timeout"
> > > [...]
> > 
> > This show how broken is your name server. It probably has problem with
> > AAAA requests.
> 
> No, not at all. I do get AAAA answers as shown in the traces. The
> dropped UDP packets can concern either A or AAAA. This is completely
> random. I suppose that this is due to network quality, in particular
> because the frequency of the timeout depends on the period of the day.
> 
> BTW, there's the same problem with other nameservers, such as those
> of our ISP. The solution was to use TCP.

Ok, then that's clearly not a GNU libc problem.


> > In short it cleary shows that the problem comes from the name server and
> > not the GNU libc:
> > - the nameserver set the truncation bit
> 
> Still unclear (see above).

As said above it doesn't follow RFC2181.

> > - the nameserver doesn't answer on the TCP port
> 
> Yes, but this is not a bug. Other nameservers don't support TCP.
> This includes those of our ISP here.
> 
> The requirement for TCP support is quite recent. According to
> 
> http://serverfault.com/questions/181956/is-it-true-that-a-nameserver-have-to-answer-queries-over-tcp
> 
> this is required by RFC 5966, whose date is August 2010. The router
> is a few years old. (And if you want to compare with glibc, until
> a few months ago, it was still not C99 conforming.)
> 
> Note also that in 2012, BIND was using UDP only (I don't know now),
> so that I was getting lots of failures with it.

While it's true that it is a recent requirement, it has been recommended
("should") for more than 25 years. Anyway my point there is that not
following the truncation requirement from RFC2181 *or* the TCP
requirement from RFC5966 would have got not consequence beside maybe
slight performance one. The problem is that your name server has *both*
issues.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net


Reply to: