[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: glibc's getaddrinfo() sort order



On Fri, Sep 07, 2007 at 01:06:06AM +0200, Kurt Roeckx wrote:
> It's atleast in the spirit of the rfc to prefer one that's on the local
> network.  It might be the intention of rule 9, but then rule 9 isn't
> very well written.

Rule 9 seems perfectly well written, it just does something you
(reasonably) consider undesirable.

The RFC says:

]   Rule 9:  Use longest matching prefix.
]   When DA and DB belong to the same address family (both are IPv6 or
]   both are IPv4): If CommonPrefixLen(DA, Source(DA)) >
]   CommonPrefixLen(DB, Source(DB)), then prefer DA.  Similarly, if
]   CommonPrefixLen(DA, Source(DA)) < CommonPrefixLen(DB, Source(DB)),
]   then prefer DB.
]
]   Rule 10:  Otherwise, leave the order unchanged.
]   If DA preceded DB in the original list, prefer DA.  Otherwise prefer
]   DB.
]
]   Rules 9 and 10 may be superseded if the implementation has other
]   means of sorting destination addresses.  For example, if the
]   implementation somehow knows which destination addresses will result
]   in the "best" communications performance.

"The admin says that rule 9 isn't appropriate" seems to fit "somehow
knows which destination address will result in the "best" communications
performance", so afaict, the description in the new gai.conf,

# sortv4  <yes|no>
#    If set to no, getaddrinfo(3) will ignore IPv4 adresses in rule 9.  See
#    section 6 in RFC 3484.  The default is yes.  Setting this option to 
#    no breaks conformance to RFC 3484.

is incorrect, in that that the implementation is still in conformance
with the RFC.

In addition, I think there's two different aspects here: the first is
"should getaddrinfo() return results in random order to aid in load
distribution?" and the second is "is prefix matching a reasonable way
to determine a good host to use?"

AFAICS, the answer to the first question is simply "no, it shouldn't" --
randomised load balancing like that needs to be done at the application
level, or by giving different sets of IPs in response to DNS queries by
different hosts, such as using BGP or similar. As far as pool.ntp.org
is concerned, that looks like the end of the story, afaics: ntp can't
rely in getaddrinfo to give a suitably random answer.

OTOH, getaddrinfo is meant to give a "close" answer, and doing prefix
matching on NATed addresses isn't the Right Thing. For IPv6, that's fine
because it's handled by earlier scoping rules. For NATed IPv4 though the
prefix we should be using is whatever the host is going to be NATed *to*.
And that would imply that the Right Thing would be to have an option
more like:

	pretend-that 10/8 is-really 1.2.3.4/32

That doesn't seem likely to work though because it requires extra
manual configuration, which won't happen.

Giving up on actually getting getaddrinfo to give "close" answers for
NATed boxes leaves the option of trying to avoid getaddrinfo going out
of its way to give "far" answers instead, which would mean turning off
prefix-matching for NATed boxes; which could be done by ignoring rule
9 by default for private IPv4 addresses.

Actually, it might also be reasonable to ignore rule 9 if

	scope(DA) > scope(source(DA)) and scope(DB) > scope(source(DB))

which seems reasonably equivalent to "DA and DB are only reachable through
a NAT" for both IPv4 and IPv6. The corner case is if the destination
is in a DMZ and can access both the Internet and local boxes directly,
but I don't think you can get the right answer for that atm anyway.

Doing it by changing Rule 9 to:

   Rule 9:  Use longest matching prefix.
   When DA and DB belong to the same address family (both are IPv6 or
   both are IPv4): If xCommonPrefixLen(DA, Source(DA)) >
   xCommonPrefixLen(DB, Source(DB)), then prefer DA.  Similarly, if
   xCommonPrefixLen(DA, Source(DA)) < xCommonPrefixLen(DB, Source(DB)),
   then prefer DB.

   If scope(X) > scope(Y) then
	xCommonPrefixLen(X,Y) = 0
   Else:
	xCommonPrefixLen(X,Y) = CommonPrefixLen(X,Y)

would give reasonable behaviour, I think (preferring addresses that can
be reached without NAT first, then leaving addresses that require NAT
in the order received).

In essence, the problem is that comparing prefixes of real addresses
against addresses that will be NATed is not adding information, and is
possibly losing information -- eg, if your site DNS already orders A
addresses by prefix matching on your actual IP range.

> I already suggested that maybe rule 9 should be limited to the common
> prefix length of the netmask you're using.  An other option is that you
> extend rule 2 to have the same behaviour with ipv4, and that 10/8,
> 172.16/12 and 192.168/16 should be considered organization-local.

Those are specified as having site-local scope in 3.2; but Rule 2 only
comes into play if one of the IPs returned by the nameserver is also
site-local anyway which isn't particularly useful.

Cheers,
aj

Attachment: signature.asc
Description: Digital signature


Reply to: