[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: help with BIND SRV



Juha-Matti Tapio wrote:
On Thu, Oct 07, 2004 at 08:23:31PM -0600, Nate Duehr wrote:

Most people setting up round-robin DNS type setups for redundancy with scripts to change things for failover get bit by these things:

[...]
- They don't understand that there might be multiple DNS servers between their top-level and the machine they're servicing (3X and 4X TTL)


RFC 1035 specifies in chapter 6.1.3. that requests served from a cache
should return a TTL which has been decremented by the amount of seconds
in cache, i.e. the TTL "counts down" in the cache.

Therefore I consider any caching nameservers that do not do this broken.
Are there a significant amount of such servers out there?

Though I agree on most of the other points.


Ahh... it's a trap.  Think about this.

1 - Regular DNS server hosting "something.com"
2 - ISP's caching nameserver
3 - Your company's nameserver
4 - A caching nameserver on your desktop machine

Now... add in here that let's say your company AND your ISP intercept all port 53 traffic and proxy all DNS requests through both of their servers. Not super-common -- but there ARE organizations and ISP's out there that do this for whatever convoluted security or other reasons.

Depending on how the proxying is set up, each server can 100% implement the RFC you mention and a change on server 1 to a record that's cached on your local desktop machine's nameserver will take 3X TTL to show up at your desktop! (It also means that if machine 1 or one of the other NS's for that zone doesn't answer at all, it's 3X TTL to clear the negative cache also.)

This of course, is NOT the norm -- but it's out there. (Think service providers and highly secured networks. I believe AOL's DNS implementation had this multiply-cascaded DNS server proxying problem for many years, but they cleaned it up in the mid-1990's.)

People forget this type of setup is out there when they set their TTL times for "quick" changes.

Of course, more common is:

1 - Machine hosting "something.com"
2 - Caching nameserver on your desktop that's allowed to make direct connections out port 53 to the world

In that scenario, TTL's work "as expected".

Another common setup:

1 - Machine hosting "something.com"
2 - Company nameserver
3 - Your desktop machine NOT running a caching nameserver, just a resolver.

Again, normal behaviour.

Once during some re-IP'ing activities for a customer when I was working for a co-location/hosting company, we had an agreeable customer who wanted to make some IP addressing changes, and who would also work with me as the "DNS guy" to help him through his transition. Because he was very paranoid about the move, we actually set him up with dual-IP's for every box he had during the transition period, and then made the DNS changes.

We found from looking at his server logs that there were still broken DNS servers and resolvers out there hitting the old IP addresses for 5 days. In this particular case, his TTL was set to 1 day.

Hopefully that gives a better indication of how many layers or how many truly-broken DNS resolvers and server setups there are out there. With his TTL's at a very reasonable 1 Day, it still took a business-week to see all his traffic move over to the new IP's. Another week later we reclaimed the extra IP's and shut down the subinterfaces on his systems, and killed the routes -- at that point there was virtually no traffic hitting the old IP range, but the number was still not at zero.

Of course if there is no negative cache entry for the zone anywhere, the fool-proof method of changing things is to provide a completely new A record name that has never been used before in your zone. That forces a lookup all the way back to your servers, and thus -- the freshest IP information.

One way I've seen this effect used very effectively during a site move was that the company reissued the zone data immediately with their "www.something.com" record changed and also a new "new.something.com" A record both pointing at the new address.

Then they set up a server at the old location that did nothing but have a redirect page from "www" to "new". They then took the big server across town and plugged it in at the IP "new" pointed to.

Client machines that had correct information quickly just went to "www" at the new address. Lagging machines and broken DNS architectures hit "www" at the old address and were redirected to "new".

Of course, this customer had to be careful about not using name-based virtual hosting on their webserver -- or making sure the big machine at the new site would answer correctly on both "www" and "new". I forget which tactic they used.

If you really think about the queries going on in DNS, there's plenty of ways to move things around safely without downtime -- I was always just floored when some dot.bomb company would decide to move IP's and leave nothing on the old IP, and just "hope" even with their 7 day or higher TTL time they'd set up long ago, that customers would find them.

Of course, being that we also sold them their bandwidth, it was usually painfully obvious that their traffic load had dropped dramatically during these hastily-planned moves. The sad part is usually at this point is when they would FINALLY call asking for help, and the only available option to them would be something like the new A-record trick, described above.

Customers painted themselves into corners with DNS and site moves all the time in the colocation/hosting biz.

Hopefully that helps visualize it...

Nate



Reply to: