Re: help with BIND SRV
Juha-Matti Tapio wrote:
On Thu, Oct 07, 2004 at 08:23:31PM -0600, Nate Duehr wrote:
Most people setting up round-robin DNS type setups for redundancy with
scripts to change things for failover get bit by these things:
[...]
- They don't understand that there might be multiple DNS servers between
their top-level and the machine they're servicing (3X and 4X TTL)
RFC 1035 specifies in chapter 6.1.3. that requests served from a cache
should return a TTL which has been decremented by the amount of seconds
in cache, i.e. the TTL "counts down" in the cache.
Therefore I consider any caching nameservers that do not do this broken.
Are there a significant amount of such servers out there?
Though I agree on most of the other points.
Ahh... it's a trap. Think about this.
1 - Regular DNS server hosting "something.com"
2 - ISP's caching nameserver
3 - Your company's nameserver
4 - A caching nameserver on your desktop machine
Now... add in here that let's say your company AND your ISP intercept
all port 53 traffic and proxy all DNS requests through both of their
servers. Not super-common -- but there ARE organizations and ISP's out
there that do this for whatever convoluted security or other reasons.
Depending on how the proxying is set up, each server can 100% implement
the RFC you mention and a change on server 1 to a record that's cached
on your local desktop machine's nameserver will take 3X TTL to show up
at your desktop! (It also means that if machine 1 or one of the other
NS's for that zone doesn't answer at all, it's 3X TTL to clear the
negative cache also.)
This of course, is NOT the norm -- but it's out there. (Think service
providers and highly secured networks. I believe AOL's DNS
implementation had this multiply-cascaded DNS server proxying problem
for many years, but they cleaned it up in the mid-1990's.)
People forget this type of setup is out there when they set their TTL
times for "quick" changes.
Of course, more common is:
1 - Machine hosting "something.com"
2 - Caching nameserver on your desktop that's allowed to make direct
connections out port 53 to the world
In that scenario, TTL's work "as expected".
Another common setup:
1 - Machine hosting "something.com"
2 - Company nameserver
3 - Your desktop machine NOT running a caching nameserver, just a resolver.
Again, normal behaviour.
Once during some re-IP'ing activities for a customer when I was working
for a co-location/hosting company, we had an agreeable customer who
wanted to make some IP addressing changes, and who would also work with
me as the "DNS guy" to help him through his transition. Because he was
very paranoid about the move, we actually set him up with dual-IP's for
every box he had during the transition period, and then made the DNS
changes.
We found from looking at his server logs that there were still broken
DNS servers and resolvers out there hitting the old IP addresses for 5
days. In this particular case, his TTL was set to 1 day.
Hopefully that gives a better indication of how many layers or how many
truly-broken DNS resolvers and server setups there are out there. With
his TTL's at a very reasonable 1 Day, it still took a business-week to
see all his traffic move over to the new IP's. Another week later we
reclaimed the extra IP's and shut down the subinterfaces on his systems,
and killed the routes -- at that point there was virtually no traffic
hitting the old IP range, but the number was still not at zero.
Of course if there is no negative cache entry for the zone anywhere, the
fool-proof method of changing things is to provide a completely new A
record name that has never been used before in your zone. That forces a
lookup all the way back to your servers, and thus -- the freshest IP
information.
One way I've seen this effect used very effectively during a site move
was that the company reissued the zone data immediately with their
"www.something.com" record changed and also a new "new.something.com" A
record both pointing at the new address.
Then they set up a server at the old location that did nothing but have
a redirect page from "www" to "new". They then took the big server
across town and plugged it in at the IP "new" pointed to.
Client machines that had correct information quickly just went to "www"
at the new address. Lagging machines and broken DNS architectures hit
"www" at the old address and were redirected to "new".
Of course, this customer had to be careful about not using name-based
virtual hosting on their webserver -- or making sure the big machine at
the new site would answer correctly on both "www" and "new". I forget
which tactic they used.
If you really think about the queries going on in DNS, there's plenty of
ways to move things around safely without downtime -- I was always just
floored when some dot.bomb company would decide to move IP's and leave
nothing on the old IP, and just "hope" even with their 7 day or higher
TTL time they'd set up long ago, that customers would find them.
Of course, being that we also sold them their bandwidth, it was usually
painfully obvious that their traffic load had dropped dramatically
during these hastily-planned moves. The sad part is usually at this
point is when they would FINALLY call asking for help, and the only
available option to them would be something like the new A-record trick,
described above.
Customers painted themselves into corners with DNS and site moves all
the time in the colocation/hosting biz.
Hopefully that helps visualize it...
Nate
Reply to: