[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#526823: libc6 2.9-9 broke DNS resolver again



On Mon, May 04, 2009 at 11:55:22PM +0200, Luca Tettamanti wrote:
> On Mon, May 4, 2009 at 10:57 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > On Mon, May 04, 2009 at 10:32:09PM +0200, Luca Tettamanti wrote:
> >> On Mon, May 4, 2009 at 10:11 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> >> > On Mon, May 04, 2009 at 09:59:22PM +0200, Luca Tettamanti wrote:
> >> >> The option single-request works, the automagic workaround does not,
> >> >
> >> > That's a good news.
> >> >
> >> >> i.e. I always see the two requests going out in parallel.
> >> >> Actually I'm not sure I understand how it's supposes to work: if the
> >> >> first request fails usually the caller gives up, no?
> >> >
> >> > The first request done by a program should timeout, and the second
> >> > request by the same program should then be done sequentially, like when
> >> > "single-request" is set.
> >>
> >> That's not what is happening though. I try to open a page in konqueror
> >> (I also tried other programs, it's not specific to konqueror): I see
> >> two request (A and AAAA) going out at the same time; konqueror says
> >> it's unable to resolve the address - so far so good. I try to reload
> >
> > That's the problem. When I say it should timeout, I mean it should take
> > long time to resolve, but at the end an answer should be returned.
> 
> Ah ok, __libc_res_nsend should to statp->retry queries, which by
> default is 2 (confirmed by gdb).
> send_dg() returns 1 (reply), the socket is then closed; return value
> is 1 and control goes back to
> __libc_res_nquery.
> 
> At this point we have the two answers:
> 
> hp = {id = 27765, rd = 1, tc = 1, aa = 0, opcode = 5, qr = 0, rcode =
> 1, cd = 0, ad = 0, unused = 0, ra = 1,
>   qdcount = 128, ancount = 1, nscount = 2, arcount = 0}
> hp2 = {id = 11116, rd = 1, tc = 0, aa = 0, opcode = 0, qr = 1, rcode =
> 0, cd = 0, ad = 0, unused = 0, ra = 1,
>   qdcount = 256, ancount = 512, nscount = 0, arcount = 0}
> 
> The error in first one is FORMERR (I'd expect NOTIMP...), which is
> treated as an unrecoverable failure even if the second one succeeded.
> answer contains a 76bytes of reply:
> 
> 756c 2b81 8000 0100 0200 0000 0003 6674  ul+...........ft
> 7002 6974 0664 6562 6961 6e03 6f72 6700  p.it.debian.org.
> 0001 0001 c00c 0005 0001 0000 02f5 000d  ................
> 0366 7470 0462 6f66 6802 6974 00c0 2f00  .ftp.bofh.it../.
> 0100 0100 0063 9400 04d5 5c08            .....c....\.
> 
> which seems sensible to me (ftp.it.debian.org is the name that I
> request, and it's a CNAME for ftp.bofh.it).
> 
> To recap: my ADSL router receives two requests and sends back *two*
> answers; to the A query it replies with the expected data, to the AAAA
> query it replies "NotImpl" (see the tcpdump in the first email). When
> both queries and sent in parallel __libc_res_nquery consider an
> unrecoverable failure an error in any of the two (even if one of them
> is valid): the logic should be reversed: the query was successful if
> we get _at least_ one response.

That's probably why some people reported success with this version of
the code, as in their case only one answer is received.

The problem is that the glibc considers NotImpl as an unrecoverable
failure, as it applies to the opcode type (here query), not to the
content of the query. Taking into account every possible answer
engineers have imagined to says that a DNS software does not support 
AAAA query (while there is a correct way to say that) is not something
easy to do in the glibc.

> After careful pondering I think that my router is actually sane  (IOW,
> it doesn't discard AAAA requests - it correctly replies that it does
> not support that query); I think that it's a real bug in glibc.

Your router is buggy. According to the RFC it answers that it does not
support queries.

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net



Reply to: