[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

getaddrinfo() behaviour



So here's my understanding of where we're at.

First, fact finding. Everything here should be able to be agreed by
everyone.

getaddrinfo() is a new interface that replaces gethostbyname(). It
hasn't different semantics that are intended to make it superior
to gethostbyname() and other functions, both in supporting IPv6 and
potentially other ways (such as resolving "foo.bar.com:http" differently
to "foo.bar.com:https").

The most authoritative document for how getaddrinfo() will order
its results is RFC3484, which is a Proposed Standard on the Internet
Standards Track and seems to be being implemented by the major vendors
including glibc and Windows.

The sorting behaviour of getaddrinfo() cannot be relied on in today's
Internet, and it behaves differently in different implementations --
particularly due to RFC3484 having been proposed only after getaddrinfo()
had already been in wide use. Further, RFC3484 specifically indicates
that the sorting behaviour may be overridden if a better order can be
determined locally (see the last paragraph of section 6). Beyond that,
determining optimal address selection appears to be an open area of
research, and modifications to RFC3484 are still being discussed and
proposed both within and outside of the RFC context [0].

Note that RFC4294 ("IPv6 Node Requirements") indicates RFC3484 "MUST be
implemented", at least in the context of dealing with multiple addresses.

The sorting behaviour specified in RFC3484 has not been in common
use within the IPv4-based Internet. Instead, by far the most common
behaviour has been to use the ordering presented by the DNS, usually
simply selecting the first returned result. This behaviour has allowed
client address selection to be influenced by the DNS system and thus the
provider of the service being addressed, as described in RFC 1794. This
has most commonly been implemented by having the DNS servers provide a
cyclic, round-robin selection of addresses, such that each address is
returned as the first result equally often.

This is not the only method for load balancing, though it is one of the
simplest and most easily deployed on today's Internet. Others include
giving entirely different results to different people doing DNS queries
such as described in the supersparrow architecture [1], or doing dynamic
load balancing of http queries via the 302 redirect response.

The primary expectations for load balancing are generally one or more of:
	- that load be evenly distributed across hosts
	- that load be biassed to the closest/cheapest host for the client
	- that load distribution be controllable by the service provider

The prefix-matching procedure described in RFC3484 does not meet those
expectations in a number of cases.

First, responsibility for destination selection is assumed entirely by the
client, so that the only choice the service provider has is to list or
not list a host. As such the service provider is faced with a choice of
providing only the best servers to the client, and not giving the client
the possibility to failover to other servers that might be available;
or having the client select a server entirely on its own judgement.

Second, when NAT is in use, a relatively small range of prefixes (10/8,
192.168/16, 172.16/12 and potentially 169.254/16) will have a high
number of users, thus leading to a bias towards servers matching those
prefixes. Further, those prefixes by design do not bear any relationship
to their actual position in the network, removing the possibility of
the bias being towards close/cheap servers.

Third, when round-robin DNS is in use, the ordering procedure described
by RFC3484 will not ensure that all servers with the best matching
prefix are given equal time as the first address returned, but instead
may be biassed towards one address depending on the exact ordering of
the addresses presented by the server [2].

Each of these objections apply to the mechanism described in RFC3484
whether applied to IPv4 or IPv6 addresses.

In addition, with particular regard to IPv4 addresses, in the present
day Internet:

	- round-robin DNS is normal
	- NAT is extremely common
	- the average prefix length in BGP tables is >22 [3], and
	  matches on shorter prefixes do not provide a strong correlation
	  with locality

...

Is the above all reasonable and uncontroversial?

If so, conclusions that could potentially be drawn:

    (a) Using prefix matching to select IPv4 addresses isn't useful
    (b) Using prefix matching to select IPv4 addresses is harmful
    (c) Using prefix matching to select IPv4 addresses is harmful enough to
        be an RC issue for Debian
    (d) Prefix matching IPv4 addresses provided the match is at least 22 bits
        (or similar) might be reasonable
    (e) Choosing the best address isn't a job for the client, and is better
        left to the service provider and DNS system
    (f) Given the existance of round-robin DNS, if prefix matching is used
        to select an address, addresses with equal prefix matches should be
	reordered randomly.

I find it hard to see any way around (a), and I'm reasonably convinced
of (b). I'm not at all convinced of (c), which I guess is to say my
hypothetical vote would be:

	[ 1 ] (a) + (b)
	[ 2 ] (a) + (b) + (c)
	[ 3 ] (a)
	[ 4 ] Further discussion
	[ 5 ] !(a)

and I imagine I'd be outvoted in favour of (a)+(b)+(c). If so, declaring
this to be an RC issue justifies both an update to etch and (if necessary,
which I don't expect) an NMU for sid/lenny, which seems all that's needed.

Conversely, if we don't consider this an RC issue, changing it for etch
doesn't seem appropriate, and at that point I don't see why we'd change
it for sid either (at least absent an update via the IETF standards track
process). I'd be interested to see explanations of why this should be
considered RC.

I'm not sure if any or all of (d)-(f) would be sufficient recommendations
to close the issue for IPv6 as well, or if there's something else that
would make sense.

Cheers,
aj

[0] http://psg.com/lists/v6ops/v6ops.2007/msg00486.html
    http://www.ops.ietf.org/lists/shim6/msg01040.html
    http://rfc.net/rfc3879.html

[1] http://www.supersparrow.org/ss_paper/html/node13.html

[2] http://lists.debian.org/debian-ctte/2007/09/msg00025.html

[3] http://bgp.potaroo.net/as2.0/bgp-active.html

Attachment: signature.asc
Description: Digital signature


Reply to: