Re: salsa.debian.org partially down

To: Alexander Wirt <formorer@debian.org>
Cc: Hector Oron <zumbi@debian.org>, Debian Devel <debian-devel@lists.debian.org>
Subject: Re: salsa.debian.org partially down
From: Ian Jackson <ijackson@chiark.greenend.org.uk>
Date: Wed, 14 Aug 2019 08:39:22 +0100
Message-id: <[🔎] 23891.47786.668518.39001@chiark.greenend.org.uk>
In-reply-to: <[🔎] 20190813115459.GK17017@marge.snow-crash.lan>
References: <20190813095051.GA23536@tilthammer.credativ.lan> <[🔎] CAODfWeGsDPp7p57xdi1E2C=EsdmKYHCHLV=1yX6eis89kTNOaw@mail.gmail.com> <[🔎] 20190813115459.GK17017@marge.snow-crash.lan>

Alexander Wirt writes ("Re: salsa.debian.org partially down"):
> It is already recovered. We will investigate where we can extend the
> ressources. But some misusages (like requesting >1300 merge requests via API
> on a big project, that in consequence run >1300 ci jobs, that...) can't be
> solved regardless on how many resources we add. 

Thanks for the reports from you and Bastian.  Thanks also for having
the energy and effort to deal with this kind of thing.  It's annoying
when a thing you're responsible for breaks because of foolish user
action, and then you have to scramble to fix it.

Maybe I'm teaching my grandmother to such eggs, but your message made
me want to suggest possible solutions/mitigations for the problem you
mention above.  Please feel free to disregard what follows.


I think the problem can be summarised/generalised as "someone makes
more requests to salsa than it has capacity to fulfil".

Traditional approaches to this include (mentioning all that I can
think of, even inappropriate or already-done ones; and, not knowing
what features gitlab has for this):

 * Per-user quotas.  (The kind of user who submits 1300 MRs might well
   react to a limit by creating more guest accounts...)

 * Per-project quotas.  (This avoids the above problem.  It
   ring-fences problems with poor contributor behaviour to the
   projects whose contributors are behaving poorly.)

 * Queuing jobs, so that the effect is contained (eg to the CI
   subsystem) until an administrator can cancel some jobs.  I think
   maybe earlier when Bastian wrote "It turns out that the configured
   amount of concurrency in CI builds can't be handled by the current
   available system resources" he was referring to a tuneable which
   would have the effect of queueing things, next time.  I guess
   you've adjusted this already.

 * Restricting resource-intensive actions to certain users.
   In our context this would seem to involve asking project
   maintainers to manually trigger CI on MRs.  That seems like it
   would be annoying and best avoided if we can.

 * Balkanising the system into multiple instances (perhaps with
   different configurations) so that each instance is exposed to a
   much smaller userbase.  I doubt we have the effort for this even if
   we could come up with a sensible division, and liked the idea.
   (One way to test the waters in this direction would be for someone
   to set up a competitor to salsa based on an entirely different
   management stack.)

 * Documentation, deterrence and punishment.  I mention this for
   completeness; given that we have so many users, and also offer
   guest accounts, this is not an appropriate strategy for salsa.

I hope that you find this message useful, rather than just a statement
of things which are mostly obvious and/or irrelevant.

Regards,
Ian.

-- 
Ian Jackson <ijackson@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Reply to:

Follow-Ups:
- Re: salsa.debian.org partially down
  - From: gregor herrmann <gregoa@debian.org>

References:
- Re: salsa.debian.org partially down
  - From: Hector Oron <zumbi@debian.org>
- Re: salsa.debian.org partially down
  - From: Alexander Wirt <formorer@debian.org>

Prev by Date: Bug#934733: ITP: webots -- Webots is robot simulator providing a complete development environment to model, program and simulate robots, vehicles and biomechanical systems.
Next by Date: Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)
Previous by thread: Re: salsa.debian.org partially down
Next by thread: Re: salsa.debian.org partially down
Index(es):
- Date
- Thread