Re: Database performance

To: "pkern@debian.org" <pkern@debian.org>, "debian-snapshot@lists.debian.org" <debian-snapshot@lists.debian.org>
Subject: Re: Database performance
From: "MOESSBAUER, Felix" <felix.moessbauer@siemens.com>
Date: Mon, 25 Nov 2024 12:27:17 +0000
Message-id: <[🔎] f17ee6dfc5c8c894db952c6b74a25a1b14dfd5f8.camel@siemens.com>
In-reply-to: <[🔎] 14f60c27-bda6-42c4-ba24-0122377c0a88@debian.org>
References: <[🔎] 14f60c27-bda6-42c4-ba24-0122377c0a88@debian.org>

On Sat, 2024-11-23 at 13:33 +0100, Philipp Kern wrote:
> Hi,
> 
> I was looking into slow queries and in general it'd be good if we
> kept
> query latency < 500ms for all queries. One particular example I could
> pick from the slow log was the one generated by

Hi, if possible all queries should be WAY below this value. I would say
even below 50ms. The DB query operations are probably THE bottleneck of
the snapshot infrastructure as of now. To reduce the stress on the DB,
we also need to check how often a query is performed. While the slow
queries a problematic w.r.t a DOS of the system, they can easily be put
under rate-limiting if they only affect the browser web interface (like
the mirrorruns_get_neighbors_change case). What is more important is to
reduce the cost of operations performed by apt (which should not be put
under strict rate limits).

> mirrorruns_get_neighbors_change. For directories that existed since
> the
> dawn of time, the query will take longer and longer because the
> last_run
> will be the most recent run and the first_run will be at the dawn of
> time - with all rows in between. In those cases the median
> assumptions
> PostgreSQL's query planner are also pretty off.

By just reading the code, I noticed that we do a lot of filter queries
on rows that are not indexed.

> 
> This is one of the cases where we should either have a smarter query
> or

I'm pretty sure these queries can be optimized, but nobody looked into
it so far as the problem was not visible.

> a materialized view. The view is going to be large (6.3G in my
> experiments), but will also reduce the query latency to a quarter
> (assuming an index on parent).
> 
> I guess one AI would be to open up a slow query log to y'all to
> investigate.

That already helps, but given the massive scale of the production DB
things might look totally different on a test system.

Best regards,
Felix Moessbauer

-- 
Siemens AG, Technology
Linux Expert Center

Reply to:

References:
- Database performance
  - From: Philipp Kern <pkern@debian.org>

Prev by Date: Database performance
Previous by thread: Database performance
Index(es):
- Date
- Thread