[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: help wanted, standing up mirroring sync proxies on public cloud



Hi Noah,

On Thu, Mar 17, 2022 at 12:55:59PM -0700, Noah Meyerhans wrote:
> On Thu, Mar 17, 2022 at 12:03:18PM +0100, Julien Cristau wrote:
> > DSA's looking into options to replace some of our archive mirroring
> > infrastructure.  For context, so far we've been maintaining a few machines
> > around the globe, called syncproxies, that serve as "hubs" for archive
> > mirroring and push downstream mirrors.
> > As some of that hardware ages we're looking at other options, including
> > using cloud, to reduce the maintenance burden and make things a bit more
> > flexible.
> 
> The syncproxy hosts store the archive and are used as rsync sources by
> the mirror network, correct?

Correct.  They each have a relatively small number of downstream mirrors
(rsync access is restricted by password), and the list is maintained
manually.  The downstream mirrors include the CDN backends for
deb.debian.org, some other debian.org hosts to support various services,
and (the majority) third party mirrors.  In most cases the downstream
mirrors are ssh-triggered.

> Inbound syncs from ftp-master (?) are also done by rsync?

Correct.  (Technically rsync over an ssl tunnel)

There's different sync sources depending on the archive: ftp-master for
the debian, debian-archive, debian-debug and debian-buildd archives,
security-master for the debian-security archive, ports-master for the
debian-ports archive.

Not all syncproxies carry all archives though, for $reasons (I think
mostly how much disk space is available and where we have demand for a
certain archive)

> Geographic diversity is a desirable trait in order to
> support the global nature of the mirror network, correct?

Right.  Currently we have:
- syncproxy.au.debian.org in Australia
- syncproxy.eu.debian.org as a non-DSA machine in Hungary
- syncproxy2.eu.debian.org in the Netherlands
- syncproxy4.eu.debian.org in Austria
- syncproxy.cna.debian.org in Minnesota, USA
- syncproxy2.wna.debian.org in California, USA
- syncproxy3.wna.debian.org in British Columbia, Canada

Initially I'd be looking at replacing one or two of the North America
ones.

> How much inbound and outbound bandwidth do they typically consume?

Looking at syncproxy2.wna
(https://munin.debian.org/debian.org/mirror-isc.debian.org/ip_149_20_4_16.html
and
https://munin.debian.org/debian.org/mirror-isc.debian.org/ip_2001_4f8_1_c__16.html)
it looks like we're around 60Mbps outbound and 700kbps inbound on
average in the last month.  That is probably the one with the most
clients though (~20 of them), as a result of an issue with one of the
other hosts a few years ago (plus the difficulty of coordinating a move
back with downstream operators) it ended up with most of the NA load.
We might be able to rebalance things a bit if we replace some hosts
anyway.

> How much local storage?

Currently we use 2T for the debian and debian-security archives, 6T if
we add debian-archive + debian-debug, 7T if we add debian-ports.

Cheers,
Julien


Reply to: