[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Debian System Administration team sprint report



Hi,

a subset of the DSA team met last week in Paris for three days of work
and discussion.  We covered a number of topics, from ongoing work to
plans for the next year or two, and review of our processes and pain
points.

We'd like to thank Mozilla for hosting us and Debian's sponsors for
covering travel and accommodation costs for the sprint.

Attendees: Aurélien Jarno, Héctor Orón Martínez, Julien Cristau,
           Martin Zobel-Helas, Peter Palfrader, Tollef Fog Heen

== Team membership
It's been two years since the last update of DSA membership[DSADPL].
Of the delegated members, Stephen Gran is no longer active in the team.
We would like to thank him for his years of service and his great
contributions to shaping the team that we are, and we wish him the best
of luck in his current and future projects.  We are working with the DPL
to update the delegation.

[DSADPL] https://lists.debian.org/debian-devel-announce/2016/05/msg00006.html

== DebConf17 plan flashback
We reviewed items and plans we mentioned in our DebConf17 talk[DC17]
last year:
- integrating debconf.org services: in addition to DNS, static websites
  and @debconf.org email, the mailing lists have now been moved to
  lists.debian.org thanks to the DebConf team and listmasters.  The
  debconf18 website is being set up on Debian infrastructure from the
  get go.  Some of the websites (wiki, www, debconf8 to debconf15,
  *.mini, ...), the git-annex service and maybe other things are still
  running on DebConf infrastructure, we should figure out which of those
  need to live on and work with existing DebConf admins on a migration
  plan.
- Alioth transition: we're happy with the git/salsa progress, and the
  continuation plan for lists.  It looks like everything else will end
  up going away.
- infrastructure refresh: We haven't made much progress on that front
  since DebConf, though see below.

[DC17] https://debconf17.debconf.org/talks/11/

== Out of band management
We reviewed our plans to get out of band management devices in hosting
locations where we're lacking such capability.  Since we drew up these
initial plans, the number of locations where we need serial console has
decreased, so most locations need a device with a few network ports and
VPN capability.  We are planning on acquiring a number of such devices
in Europe and North America, either through donation or purchase.  We
also agreed not to invest in OOB capability in locations where we only
maintain a single redundant mirror host.

== Core hardware refresh
We discussed hardware refresh plans for our locations in MAN-DA,
Bytemark and UTwente.  We are discussing specifics with our hosting
partners and hardware providers/resellers.

We are thinking of moving away from blade centers and shared storage
towards individual rack-mount servers, and away from spinning rust
towards SSD.  We also discussed the static.debian.org setup and whether
we could move that to a "caching proxy" setup, either by running our own
or relying on a partner CDN.

== security.debian.org
The traffic for security.debian.org currently peaks at around 25Gbps
globally for just the linux kernel in a single suite.  The base load
(with nothing happening) is around 1 to 3Gbps (constantly.  Yes.
Really).  Our current mirrors cannot deal with that peak demand, so for
the last year and a half we have had to redirect some updates to the
security-cdn.debian.org service sponsored by Fastly[FASTLY].

We think it's time to make that permanent, and across the whole
security archive, so we will be:
- adding a SRV record for _http._tcp.security.debian.org pointing at the
  Fastly service
- setting up a HTTP redirect on http://security.debian.org/
- setting up a separate rsync.security.debian.org name for folks who
  want to keep mirroring that archive via rsync
- focusing on reliability of a couple of backends in Europe and the CDN
  service so users get the quality level they expect from
  security.debian.org

We also opted to end the BGP mirror experiment, as none of us can
currently drive that effort.

[FASTLY] https://www.fastly.com/open-source

== Service ownership
We would like to collect better information about which GIDs/teams are
responsible for services we host.  The goal is that we would be
able to check with them every so often whether the service is still
relevant/needed.  Also, it would enable us to more easily find the people
to talk to for for updating or changing hosts, and it would allow us
to more reliably identify hosts and services that can be shut down or that
have become unmaintained or unneeded.

In the same spirit, we would like to have users reaffirm their use of
and need for extra groups regularly.

== Architecture qualification
We reviewed the current candidate architectures for the upcoming buster
release, and their health from the DSA point of view.  We have updated
the architecture qualification table[AQ] with current status (for the
porterbox, buildds, and DSA concern criteria).

In short, the hardware (development boards) we're currently using to
build armel and armhf packages aren't up to our standards, and we
really, really want them to go away when stretch goes EOL (expected in
2020).  We urge arm porters to find a way to build armhf packages in VMs
or chroots on server-class arm64 hardware.

We would like to acquire arm64 hardware with similar setup to what we are
hosting at Conova (two server-class, virtualization-capable arm64
boxes in a ganeti cluster) in a second location, as we've been happy
with that setup for buildds and porterboxes.

We will have to replace some of our older MIPS-based hardware in the
next couple of years, so we will discuss options with potential sponsors.

[AQ] https://release.debian.org/buster/arch_qualify.html

== Workflow review
Some recurring tasks are not well enough documented in our
wiki[DSAWIKI], we should strive to make this better so we're not stuck
waiting for one another trying to turn alerts into actionable items.

User requests that come in tend to get handled either really quickly or
not at all.  It's often unclear whether a request got stuck because it's
not something we want to do or just because nobody was available when it
was submitted.  One idea that was floated was to have regular (e.g. once
per quarter) IRC meetings to go over current work and stalled items.

[DSAWIKI] https://dsa.debian.org/

== Monitoring and metrics
We are currently using icinga for monitoring and munin for graphing, but
these tools are showing their age.  We would like to experiment with
Prometheus and something like grafana.  However, it's unclear at this
point what the deployment would look like, and grafana is not in stable
or testing.

== Stretch upgrade
We are almost done with our stretch upgrades, with 5 more hosts getting
upgraded during the sprint itself.  We're left with the
snapshot.debian.org hosts (hit by the removal of ruby-dbi & friends),
moszumanska (alioth) to be retired at wheezy-lts EOL, and powerpc
buildds and porterbox to be retired at jessie EOL.

== Request Tracker
We cleaned up the DSA RT queues of old, mostly obsolete tickets that had
been sitting there for too long.

== Snapshot storage
The snapshot.debian.org mirror hosted by LeaseWeb has been running out
of disk space.  Last year, LeaseWeb offered two new machines to
supplement the existing 4 storage hosts, but getting them online had
been blocked on us reshuffling the internal network setup to get
everything moved to a bigger switch.  That has now happened, and
lw09.debian.org and lw10.debian.org are almost ready to offer 30TB of
additional storage.  Work is in progress to bring the leaseweb mirror
up-to-date again.

== Entropy
We're currently using Entropy Keys[EKEY] in a few machines and stunnels
to ship that entropy everywhere else.  Last year we purchased
ChaosKeys[CHAOS] to supplement/replace that setup, and we discussed
where and how to ship them.

[EKEY] http://www.entropykey.co.uk/
[CHAOS] http://altusmetrum.org/ChaosKey/

== UEFI Secure Boot
DSA holds the private key corresponding to the CA embedded in the
shim[SHIM] package, which will allow us to support secure boot.  As part
of the disaster recovery plan for that CA, we have split the key so it
can be re-assembled without depending on a single one of us.

[SHIM] https://packages.debian.org/sid/shim-signed

== EU GDPR
The EU General Data Protection Regulation[EUGDPR] comes into effect on
25 May 2018, less than 4 months from now.  Since Debian collects and
processes personal data for both end users and Debian Developers, we're
overdue in examining our processes and possibly making changes.  We look
forward to helping the the DPL and other teams to address this.

[EUGDPR] https://www.eugdpr.org/

Cheers,
Julien, for DSA

Attachment: signature.asc
Description: PGP signature


Reply to: