[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#834098: libc6: name resolution fails for keys.gnupg.net on some machines / networks



On 2016-08-12 23:24:29 +0200, Aurelien Jarno wrote:
> On 2016-08-12 12:15, Vincent Lefevre wrote:
> > According to tcpdump output below, there is no truncation: the number
> > of A's and AAAA's (10 for each) match what "host keys.gnupg.net"
> > gives. BTW, even if there were a truncation, there shouldn't be a
> > failure: using of the returned IP addresses would be sufficient to
> > connect.
> 
> That a wrong assumption. The libc getaddrinfo interface is not to
> connect to an IP, but rather to return *all* addresses corresponding to
> a query. The returned IPs are not necessarily used for a connection
> later. 

I was not suggesting not to return all addresses. But in case of error
(which could just be a temporary network error, not necessarily due to
a bug in the nameserver, e.g. due to network congestion), if some of
the IP addresses are known, they could be made available to the calling
application in case they could be useful (e.g. for a connection). If
the application wants all the addresses, it can check error conditions
as usual.

> Not returning all addresses so might lead to data loss or
> security issue.

Well, an application should not base its security on the nameserver.
It is well-known that nameservers can return fake answers.

And I would say that it could be the opposite. Imagine a host with
hundreds of millions of IP addresses...

> The point is that the local resolver is supposed to be working
> correctly.

and the network quality is good, which is not always the case.

> If it doesn't, one can easily setup a local recursive name server
> like unbound.

Unfortunately, this is not a general solution due to buggy ISP's.

> > 11:55:59.097743 IP 192.168.0.6.41008 > 192.168.0.1.domain: 60367+ A? keys.gnupg.net. (32)
> > 11:55:59.097796 IP 192.168.0.6.41008 > 192.168.0.1.domain: 31606+ AAAA? keys.gnupg.net. (32)
> > 11:55:59.098339 IP 192.168.0.6.38010 > 192.168.0.1.domain: 4217+ PTR? 1.0.168.192.in-addr.arpa. (42)
> > 11:55:59.143100 IP 192.168.0.1.domain > 192.168.0.6.38010: 4217 NXDomain* 0/1/0 (94)
> > 11:55:59.143325 IP 192.168.0.6.43592 > 192.168.0.1.domain: 23396+ PTR? 6.0.168.192.in-addr.arpa. (42)
> > 11:55:59.161082 IP 192.168.0.1.domain > 192.168.0.6.41008: 60367 11/9/5 CNAME pool.sks-keyservers.net., A 198.128.3.63, A 93.94.119.246, A 78.46.223.54, A 131.175.15.4, A 151.252.40.184, A 5.9.50.141, A 209.135.211.141, A 5.135.158.148, A 68.187.0.77, A 193.17.17.6 (502)
> 
> This tcpdump trace doesn't show the answer header, so we don't know if
> the truncation flag is set. That said the 11/9/5 says that the answer
> contains 11 answer records, 9 name server records and 5 additional
> records. This clearly doesn't fit. A normal DNS server would just return
> 11 answers, so 11/0/0.
> 
> That said I just realized that the strace entry in your previous email
> contains the beginning of the answer:
> 
> > 30419 recvfrom(4, "'J\203\200\0\1\0\v\0\10\0\0\4keys\5gnupg\3net\0\0\34\0\1"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.0.1")}, [16]) = 500
> 
> Converted into hexadecimal, this is:
>   27 4a 83 80 00 01 00 0b 00 08 00 00 04 6b 65 79
>   73 05 67 6e 75 70 67 03 6e 65 74 00 00 1c 00 01
> 
> 274a is the identification. The flags are 8380 and corresponds to QR,
> TC, RD, RA. Your name server clearly says that the answer is truncated.
> On a working nameserver, the flags are 8180 for this query, so the same
> without the truncation flag.

I don't understand here. You said above "This clearly doesn't fit.",
so that it is normal that the truncation flag is set, isn't it?
Or do you mean that the answer should have been 11/0/0, so that
the truncation flag wouldn't be set as a consequence?

> Even if it is a quite standard setup, you have to admit it doesn't
> behave according to the RFC.

I wonder which part of the RFC you are talking about.

> You should complain to the manufacturer and try to get a firmware
> update.

I'll see what I can do.

> Trying to workaround things on the libc side just gives even less value
> to the RFCs, and encourage selling broken hardware.

I doubt that GNU libc would make any difference. What matters is
how MS-Windows behaves, and probably nowadays Android and iOS too.
Also, if there were conformance tests, e.g. from the Linux
community, this could help. At least the buyers would have a way
to choose, and it could be easier to report issues to the vendors.

> > FYI, I also often get 5-second timeouts in name resolution whatever
> > the host (you can see it above): I get the answer for A or AAAA, but
> > sometimes, the other answer is lost. I have a DHCP hook that tests
> > whether I'm using this router:
> > 
> > [...]
> >   ping -n -c 1 -I "$interface" "$new_routers" > /dev/null
> >   if grep -i -q $mac /proc/net/arp; then
> >     logger "Google Public DNS with TCP to avoid recurrent timeout"
> > [...]
> 
> This show how broken is your name server. It probably has problem with
> AAAA requests.

No, not at all. I do get AAAA answers as shown in the traces. The
dropped UDP packets can concern either A or AAAA. This is completely
random. I suppose that this is due to network quality, in particular
because the frequency of the timeout depends on the period of the day.

BTW, there's the same problem with other nameservers, such as those
of our ISP. The solution was to use TCP.

> In short it cleary shows that the problem comes from the name server and
> not the GNU libc:
> - the nameserver set the truncation bit

Still unclear (see above).

> - the nameserver doesn't answer on the TCP port

Yes, but this is not a bug. Other nameservers don't support TCP.
This includes those of our ISP here.

The requirement for TCP support is quite recent. According to

http://serverfault.com/questions/181956/is-it-true-that-a-nameserver-have-to-answer-queries-over-tcp

this is required by RFC 5966, whose date is August 2010. The router
is a few years old. (And if you want to compare with glibc, until
a few months ago, it was still not C99 conforming.)

Note also that in 2012, BIND was using UDP only (I don't know now),
so that I was getting lots of failures with it.

> - the nameserver sometimes drop AAAA queries

I have shown that this is not related to the nameserver.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Reply to: