[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#526823: libc6 2.9-9 broke DNS resolver again



On Mon, May 4, 2009 at 10:57 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On Mon, May 04, 2009 at 10:32:09PM +0200, Luca Tettamanti wrote:
>> On Mon, May 4, 2009 at 10:11 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
>> > On Mon, May 04, 2009 at 09:59:22PM +0200, Luca Tettamanti wrote:
>> >> The option single-request works, the automagic workaround does not,
>> >
>> > That's a good news.
>> >
>> >> i.e. I always see the two requests going out in parallel.
>> >> Actually I'm not sure I understand how it's supposes to work: if the
>> >> first request fails usually the caller gives up, no?
>> >
>> > The first request done by a program should timeout, and the second
>> > request by the same program should then be done sequentially, like when
>> > "single-request" is set.
>>
>> That's not what is happening though. I try to open a page in konqueror
>> (I also tried other programs, it's not specific to konqueror): I see
>> two request (A and AAAA) going out at the same time; konqueror says
>> it's unable to resolve the address - so far so good. I try to reload
>
> That's the problem. When I say it should timeout, I mean it should take
> long time to resolve, but at the end an answer should be returned.

Ah ok, __libc_res_nsend should to statp->retry queries, which by
default is 2 (confirmed by gdb).
send_dg() returns 1 (reply), the socket is then closed; return value
is 1 and control goes back to
__libc_res_nquery.

At this point we have the two answers:

hp = {id = 27765, rd = 1, tc = 1, aa = 0, opcode = 5, qr = 0, rcode =
1, cd = 0, ad = 0, unused = 0, ra = 1,
  qdcount = 128, ancount = 1, nscount = 2, arcount = 0}
hp2 = {id = 11116, rd = 1, tc = 0, aa = 0, opcode = 0, qr = 1, rcode =
0, cd = 0, ad = 0, unused = 0, ra = 1,
  qdcount = 256, ancount = 512, nscount = 0, arcount = 0}

The error in first one is FORMERR (I'd expect NOTIMP...), which is
treated as an unrecoverable failure even if the second one succeeded.
answer contains a 76bytes of reply:

756c 2b81 8000 0100 0200 0000 0003 6674  ul+...........ft
7002 6974 0664 6562 6961 6e03 6f72 6700  p.it.debian.org.
0001 0001 c00c 0005 0001 0000 02f5 000d  ................
0366 7470 0462 6f66 6802 6974 00c0 2f00  .ftp.bofh.it../.
0100 0100 0063 9400 04d5 5c08            .....c....\.

which seems sensible to me (ftp.it.debian.org is the name that I
request, and it's a CNAME for ftp.bofh.it).

To recap: my ADSL router receives two requests and sends back *two*
answers; to the A query it replies with the expected data, to the AAAA
query it replies "NotImpl" (see the tcpdump in the first email). When
both queries and sent in parallel __libc_res_nquery consider an
unrecoverable failure an error in any of the two (even if one of them
is valid): the logic should be reversed: the query was successful if
we get _at least_ one response.

After careful pondering I think that my router is actually sane  (IOW,
it doesn't discard AAAA requests - it correctly replies that it does
not support that query); I think that it's a real bug in glibc.

Luca



Reply to: