[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Call for Votes (getaddrinfo)



On Fri, Dec 07, 2007 at 01:55:28AM -0800, Steve Langasek wrote:
> > I haven't seen any concrete reports we could pass on, or any indication
> > we're likely to come up with a better mechanism, though, which leaves us
> > as doing nothing by default.
> I've previously argued that there are at least two mechanisms that would be
> an improvement:
> - drop rule 9 altogether, passing through the sorting supplied by the DNS
>   server (whether that's round-robin, or sorted by some other server-side
>   rule)

From my search earlier in this thread, I believe rule9 is specifically
relied upon by one of the RFCs talking about multi-homing in IPv6, though
I'm afraid I don't have a reference. I don't think it's reasonable to
just drop it without some alternative way of addressing:

	- stability of results in the face of round-robin, in order to give
	  apps repeatable results

	- more fine-grained (and use specific) scoping than the
	  global/site/local scoping that's already defined

	- traffic-path optimising address selection

Of course, those could be addressed by:

	- providing apps with an explicit sorting function that will sort
	  addresses according to the above requirements, no matter how
	  they're obtained
	- having apps specify when they want them taken into account
	  when asking for address resolution (cf rfc5014)
	- having libc's resolver take them into account on instruction
	  by the admin
	- having the site DNS server take them into account when deciding
	  how to order multiple A records
	- having the service provider use mechanisms other than DNS
	  round-robin to present the right address to whoever's asking

Replacing rule9 with a statement that implementors MAY re-order
addresses according to site policies but SHOULD NOT do so unless
they can do better than random round-robin selection, and providing an
IPV6_PREFER_DST_PREFIXMATCH setsockopt option for applications that desire
the rule9 behaviour would be pretty straightforward and fairly compatible,
afaics. Providing an IPV6_PREFER_DST_STABLE option that ensures a stable
result per-host while still being globally random would be possible too,
I think.

> - apply rule 9 only in the case that the common prefix is longer than the
>   prefix length of the "natural" unit network for the address family (/32
>   for IPv6, /22 for IPv4)

AFAICS that doesn't actually achieve anything much over just dropping
the rule.

> Do you disagree with the proposition of one of these being preferable to the
> current behavior?

I don't think I'm informed enough on what's made assumptions relying on
rule9 to get rid of it entirely, and afaics no one else here is actually
any better informed. Which is why I keep complaing about how little
evidence I've seen...

> > I could be convinced it's RC, but I've seen precious little *actual*
> > impact -- certainly people are surprised by the change in behaviour,
> > and it does change traffic characteristics, but ... that seems to be it,
> > so far. Where's the actual damage and problems?
> The changed traffic characteristics are certainly damaging, at least
> potentially.  

Come on, there's no such think as "certainly damaging, at least
potentially". It's either potentially damaging, or certainly damaging.

> It can have financial consequences for anyone that's invested
> in infrastructure with the expectation that round robin will continue to
> work, and find that they have to choose between completely revamping their
> DNS infrastructure to feed clients targetted results, or renegotiating
> hosting/bandwidth contracts to accomodate client selectivity that's outside
> of the server's control.  

Then it's a good thing that behaviour's documented in a standards-track
RFC so they know what to expect before doing the roll-out. And as far
as I can see, the actual impact in practice has turned out not to be
even serious enough to be able to be demonstrated...

> Obviously in the general case Debian doesn't have
> enough market share to be able to fix this on our own, but that doesn't stop
> us from ensuring that Debian behaves as a good citizen.

Huh? For the case of apt-get we certainly have enough market share to
make a difference; yet we haven't even got to the point of having a
documented problem report for it to fix yet.

> In terms of how the behavior of Debian clients makes a difference, Debian
> systems are certainly the primary consumers of the Debian mirror network.  I
> think that losing a mirror sponsor due to the uneven load distribution, ...

Sure, but where's the evidence that this is even resulting in uneven
load distribution? If ike did crash or similar at some point (which,
personally I can't even verify), where's the evidence that was because
of getaddrinfo behaviour on clients accessing it, and not unrelated
problems on that machine or network?

To give a counter-example: if rule9 behaviour can be relied on, and you're
running host 222.222.222.222 and decide you're getting too much traffic
for one machine to handle, you can add 222.222.222.223, and have your
network traffic be the same (rather than going from 1/nth of the total
traffic to 2/(n+1)th of the total traffic), but have each host only get
50% of that traffic, using round-robin. [0]

> -- and in advance,
> not just once we've found that it's an imminent problem for us and our
> users.

Uh, our apt network is already using rule9 for both stable
and testing/unstable, and has been for half a year (longer for
testing/unstable). If it's not a problem now, why would it possibly be
a problem in the future, imminent or otherwise?

(And what the hey? Are you seriously putting me in the position of
arguing against pre-emptive strikes based on patchy intelligence now?)

Cheers,
aj

[0] Well, in theory. In practice you'll get unbalanced behaviour unless you
    have an equal number of the other IPs in the round-robin on either side
    of your hosts. 

    See http://lists.debian.org/debian-ctte/2007/09/msg00020.html for an
    example where the round-robin pass through in rule10 doesn't result
    in load-balancing.

Attachment: signature.asc
Description: Digital signature


Reply to: