On 20/10/2013 5:55 PM, Raphael Geissert wrote:
Stephen Gran wrote:That's mostly because we're not actually 'using' them now - we're just allowing them to cache. Most CDNs have a decache mechanism of some sort or other that we could use on mirror pulses, or we could tune the cache headers to actually make it possible for CDNs to do the right thing, etc.I'm aware of those methods to expire objects and I can tell you that they are already in use for cloudfront.d.n. Even with them I still need to re- enable cloudfront.d.n from http.debian.net as from time to time it fails a consistency check and gets banned. Looking at report.txt right now it seems that some got banned again.
It's been some time since i have delved into these deeply again, but I'd love to continue to tune down the cache expiry headers on cloudfront.debian.net as needed; for those not aware I covered this in detail in my presentation at Debconf. Here's the summary of the 'introduced' Cache-Control Max-Age headers that are currently in place:
Default: 24 hours (Cloudfront Default on objects that have no headers)
/debian/dists/*: 15 minutes (default for all files, overridden by subsequent rules)
/debian/dists/(unstable|sid)/.*: 5 minutes
/debian/dists/.*\.diff/[\d-]+\.gz: 2 hours - these are datestamped filenames and don't change once uploaded
/debian/dists/.*\.diff/(Index)?: 10 seconds
/debian/dists/.*/(Contents-.*\.(bz|gz)|(In)?Release(\.gpg)?)?: 20 seconds
/debian/dists/(unstable|sid)/(Contents-.*\.(bz|gz)|(In)?Release(\.gpg)?)?: 10 seconds
/debian/dists/.*/i18n/(Index|Translation-.*)?: 10 seconds
/debian/dists/.*/(binary-.*|source)/(Packages(\..*)?|Sources(\..*)?|Release)?: 10 seconds
/debian/project/.*: 10 seconds
These rules are all running on a tiny little Apache instance, becuase upstream mirrors do not have any Cache-Control headers. I am hoping that if we are happy refining these cache times, we can migrate these rules to an upstream HTTP server (this is currently using ftp.debian.org) and do away with an 'interstitial' server that is squirting these headers into the response.
0 seconds would work but probably overload things - that's no caching at all for every edge hit. And 1 second means that for objects in heavy use, we'll have 43 hits/sec, so something a little longer than that... hence the 10/20 seconds lines above.
CloudFront is currently 43 locations worldwide, and continuing to expand. That's much less than the 400 mirrors in the Debian list. I would not recommend abandoning mirrors.
(Its just mod proxy and mod headers in Apache)
(PS: I am travelling this week and possibly slower at answering email - ping jameseb_AT_amazon.com if you need me urgently)
(PPS: Happy to give any DD/DM or read access into the AWS Account - just send me a signed email - and read/write (ie, for DSA) if you want).
Mobile: +61 422 166 708, Email: james_AT_rcpt.to