[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: overly short max-age of archive -> file redirect



On Wed, 2024-11-06 at 07:19 +0100, Linus Nordberg wrote:
> "MOESSBAUER, Felix" <felix.moessbauer@siemens.com> wrote
> Tue, 5 Nov 2024 09:09:21 +0000:
> 
> > Further, the rate-limits should be precisely documented so clients
> > /
> > caching proxies can adapt to this. The limits also need to match
> > the
> > retry-after header in the 429 responses. Currently s.d.o responds
> > with
> > retry-after 5 (seconds), which is by far to short to overcome the
> > limit.
> 
> Thanks for reporting this.
> 
> I've been planning on documenting the rate limiting somewhere once it
> seems to behave reasonably well. It was added in a bit of a haste
> last
> weekend.
> 
> One thing I would hesitate to put in a document somewhere for a human
> to
> read once and then base their implementation on is numbers that are
> likely to change. What do you think about the server setting
> X-RateLimit-Remaining in responses?

Hi,

that would be even better. In [1] there are more headers regarding rate
limiting which could be considered. Maybe check these as well.

[1]
https://www.ietf.org/archive/id/draft-polli-ratelimit-headers-02.html

> 
> The Retry-After has been fixed in [this patchset][] which I will try
> to
> get merged in a day or two.

Thanks! Currently apt does not obey this, but I'll try to add support.
Currently apt just retries with exponential backoff (1,2,4,...) secs
and most people just recommend a max of 3 retries. Hence, this does not
help at all to overcome the limit. But lets fix things one-by-one.

> 
> [this patchset]:
> https://salsa.debian.org/linus/dsa-puppet/-/compare/production...linus%2Fsnapshot-ratelimit?from_project_id=2990
> 
> 
> > If rate-limiting would be implemented correctly, downstream caches
> > could properly cache the results and clients like apt could behave
> > nicely. I further recommend to use WAY higher request limits in
> > combination with a moving average limit on the amount of
> > transferred
> > data. By that, the cheap "is my cache still valid" requests could
> > pass,
> > while the more heavy payload transfers are avoided. Also clients
> > could
> > hit s.d.o without reduced transfer rates, hence reducing the amount
> > of
> > open handles on the server.
> 
> What do you think would be a reasonable request limit?

That depends on what your server can handle. It's been 5 years since I
worked in that field, so take the following with a grain of salt:

In general, only as few as possible requests should reach the flask
application, as it is not suited for high loads. Running it under
gunicorn or another executor instead of direct improve things a bit as
you get multithreading. Anyways, the most important thing is to return
correct cache headers from the flask app, so Varnish can cache them.
These cache hit rates should be monitored and should be close to 100%
during normal operation (except for the intentionally direct calls). In
contrast, large files (e.g. large / all debs) should be served directly
(bypassing the cache and ideally flask). For details, please also see
[2].
Finally, the storage backend needs to be fast - and by that not a fuse.

Once all that is implemented, likely the limits can be much higher
without risking DoS.

[2]
https://tedboy.github.io/flask/generated/flask.send_from_directory.html

> 
> We're currently limiting number of requests per time unit since
> that's
> what killed one of the servers last week. I haven't looked into how
> to
> limit the amount of transferred data yet.

Having this limit applied per IP is unfortunate for big companies (like
Siemens), as all user are funneled via a couple of egress IPs.

Best regards,
Felix Moessbauer 

-- 
Siemens AG, Technology
Linux Expert Center



Reply to: