[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: help wanted, standing up mirroring sync proxies on public cloud



Hi Julien,

Digging up this thread after our last cloud team meeting, hoping to
shake out the next steps.  Summarizing things so far:

1. you've provded some technical requirements for the hosts, all of
which sound easily do-able to me.
2. Amazon has given us the go-ahead to use their sponsored services.

Remaining questions:

Bastian had some architecture suggestions - which sound like nice
improvements.  But also like they might require work that no one is
signed up to do.  So I'm guessing it might be good to proceed with
individual hosts for now?

It probably makes sense for the cloud team to own the cloud
infrastructure, and DSA to own the OS config and ongoing operations.  Is
this what you were thinking?

You mentioned doing two hosts in North America, did you have specific
regions in mind?  The most natural choices are probably Virginia and
Oregon in the US, and Quebec in Canada.  Info on other choices at [1].

Do you care about the architecture?  arm64 would be a bit cheaper.  But
not so much cheaper that it'd be worth any DSA time.

It's been a while since your initial mail - has anyone from the cloud
team been in touch to help with the deployment?

Thanks,
Ross

[1] - https://aws.amazon.com/about-aws/global-infrastructure/regions_az/


On Thu, Mar 17, 2022 at 10:01:11PM +0100, Julien Cristau wrote:
> Hi Noah,
> 
> On Thu, Mar 17, 2022 at 12:55:59PM -0700, Noah Meyerhans wrote:
> > On Thu, Mar 17, 2022 at 12:03:18PM +0100, Julien Cristau wrote:
> > > DSA's looking into options to replace some of our archive mirroring
> > > infrastructure.  For context, so far we've been maintaining a few machines
> > > around the globe, called syncproxies, that serve as "hubs" for archive
> > > mirroring and push downstream mirrors.
> > > As some of that hardware ages we're looking at other options, including
> > > using cloud, to reduce the maintenance burden and make things a bit more
> > > flexible.
> > 
> > The syncproxy hosts store the archive and are used as rsync sources by
> > the mirror network, correct?
> 
> Correct.  They each have a relatively small number of downstream mirrors
> (rsync access is restricted by password), and the list is maintained
> manually.  The downstream mirrors include the CDN backends for
> deb.debian.org, some other debian.org hosts to support various services,
> and (the majority) third party mirrors.  In most cases the downstream
> mirrors are ssh-triggered.
> 
> > Inbound syncs from ftp-master (?) are also done by rsync?
> 
> Correct.  (Technically rsync over an ssl tunnel)
> 
> There's different sync sources depending on the archive: ftp-master for
> the debian, debian-archive, debian-debug and debian-buildd archives,
> security-master for the debian-security archive, ports-master for the
> debian-ports archive.
> 
> Not all syncproxies carry all archives though, for $reasons (I think
> mostly how much disk space is available and where we have demand for a
> certain archive)
> 
> > Geographic diversity is a desirable trait in order to
> > support the global nature of the mirror network, correct?
> 
> Right.  Currently we have:
> - syncproxy.au.debian.org in Australia
> - syncproxy.eu.debian.org as a non-DSA machine in Hungary
> - syncproxy2.eu.debian.org in the Netherlands
> - syncproxy4.eu.debian.org in Austria
> - syncproxy.cna.debian.org in Minnesota, USA
> - syncproxy2.wna.debian.org in California, USA
> - syncproxy3.wna.debian.org in British Columbia, Canada
> 
> Initially I'd be looking at replacing one or two of the North America
> ones.
> 
> > How much inbound and outbound bandwidth do they typically consume?
> 
> Looking at syncproxy2.wna
> (https://munin.debian.org/debian.org/mirror-isc.debian.org/ip_149_20_4_16.html
> and
> https://munin.debian.org/debian.org/mirror-isc.debian.org/ip_2001_4f8_1_c__16.html)
> it looks like we're around 60Mbps outbound and 700kbps inbound on
> average in the last month.  That is probably the one with the most
> clients though (~20 of them), as a result of an issue with one of the
> other hosts a few years ago (plus the difficulty of coordinating a move
> back with downstream operators) it ended up with most of the NA load.
> We might be able to rebalance things a bit if we replace some hosts
> anyway.
> 
> > How much local storage?
> 
> Currently we use 2T for the debian and debian-security archives, 6T if
> we add debian-archive + debian-debug, 7T if we add debian-ports.
> 
> Cheers,
> Julien
> 
> 


Reply to: