Planning for a mirror using Google Cloud CDN
During the cloud summit we had a chat about distribution mirrors used in
our images. Currently we have three different approaches in the large
Amazon: Uses the CDN provided by Amazon. This CDN is also used as
backend for deb.debian.org.
Google: Uses httpredir.debian.org.
Microsoft: Maintains a network of mirrors in all of their production
regions (24 as of now).
Because of the problematic state of httpredir and a potential large
amount of systems running the same software, we came to the
understanding that using some sort of mirror within the infrastructure
is a good idea.
Google had a mirror within their cloud, but scrapped it because they
where unable make it stable enough. I currently maintain the mirror
network within Azure and I found that maintaining even 40+ mirrors to be
pretty low maintenance. It sometimes produces stray network timeouts,
which may be possible to fight with a retry logic in ftpsync on
connection timeouts. Primed with the collevtive knowledge we started to
think about providing mirrors within Google cloud again.
We got the ok from Google to use their Cloud CDN as a public mirror.
There is one technical limitation in the implementation left, which
needs to be fixed first, but I'm confident they will be able to do that.
So I'd like to draft a plan for such a mirror.
My plan for implementing this CDN mirror is a follows.
The CDN needs to be backed by instances running inside the Google cloud.
We will run three mirror pairs in different locations. Two instances in
one location will provide availability even if we need to take one
offline. Most likely this mirrors will be located in us-central,
europe-west and asia-east.
Each mirror will host a complete copy of the main and security archive.
Disk space is cheap and we want to reduce operational load for
maintaining larger sets of mirrors. In contrast to the CDN hosted by
Fastly and Amazon we also don't want to use the same backends
(ftp.debian.org and security.debian.org) this CDN is supposed to relieve
The mirrors on this backends are not updated at the exactly same time.
I'm not yet completely sure how this will interact with the cache within
the CDN. This problem exists both within one location and between
For updates within one location that will be a problem. Requests are
load balanced between both instances and the only thing we can do is to
implement session stickyness based on client IP. However I assume that
using a two stage update should be enough, which updates all mirrors in
the set for stage one and then all for stage two. It makes sure all
referenced files (packages, byhash-ed files) are already available
before any of the mirrors gets the InRelease file.
For updates between different locations we should be safe. Different
connections of one client should use the same locations unless something
fails. But we could think about doing two stage updates throughout the
The mirror network will be updated from the outside via push to a mirror
master in the US location and propagate the changes internally.
What we may want is a check on boot of any of the internal systems if
the local mirror corresponds to the rest of the network.
Does anyone see problems with this plan?
Time is fluid ... like a river with currents, eddies, backwash.
-- Spock, "The City on the Edge of Forever", stardate 3134.0