Re: [PATCH 1/1] snapshot::web Fix internal redirects to farm
On Tue, 2024-11-19 at 23:51 +0100, Philipp Kern wrote:
> Hi,
>
> On 11/19/24 10:21 AM, MOESSBAUER, Felix wrote:
> > They are definitely not - as we know by now.
>
> Indeed mlm-01 gets a relative one, lw-07 gets an absolute one.
> Because
> of this:
>
> > AliasMatch "^/file/([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{36})$"
> > "/srv/snapshot.debian.org/farm/$1/$2/$1$2$3"
> > <% if ['snapshot-mlm-01',
> > 'sallinen'].include?(@trusted['hostname']) -%>
> > # we temporarily and manually enabled mod proxy in apache
> > SSLProxyEngine on
> > RewriteCond %{REQUEST_URI} "^/file/[0-9a-f]{40}"
> > RewriteCond "/srv/snapshot.debian.org/farm/$1/$2/$1$2$3" !-f
> > #RewriteRule "^/file/([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{36})$"
> > - [G]
> > RewriteRule "^/file/([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{36})$"
> > https://snapshot-lw07.debian.org/file/$1$2$3 [P]
> > <% end -%>
> >
>
> That one does not trigger on lw07 because that's the only host this
> rule
> redirects to. So lw07 has divergent behavior. :/
>
> One point I'd make is that we could cache better with the redirect in
> place: The cache can keep the file by-hash for multiple snapshot
> timestamps - because it'd be kept by /file/hash key. Assuming that
> people request similar debs from different snapshot timestamps.
That's true, but then we need to get rid of the existing varnish
redirect as well. Also, apt-cacher-ng 3.7.4-1 (bookworm) does not
support these redirects, so we need a proposed-update for that
component [1].
[1] https://lists.debian.org/debian-snapshot/2024/10/msg00010.html
>
> If we knew what the filename(s) were (which should be kinda unique in
> the Debian world!?[1]), we could add a Content-Disposition[2] to send
> the filename to the browser.
I started on this in
https://salsa.debian.org/snapshot-team/snapshot/-/commit/1aaf5d08f5d3ee1ddd1a3ac9b15f6727b5eebad8,
for debugging purposes, but of course that does not help when serving
the files via apache directly.
>
> In that spirit maybe it'd be nice to pass the filename to /file/ -
> e.g.
> /file/hash?filename=hello_1.0-1_all.deb - and then add a
> Content-Disposition header. That'd still not break the object
> caching.
I thought about this as well, but we cannot pass the filename as-is, as
it might contain non-url parameters. Instead, we need to urlencode (in
the app) and urldecode in apache to extract the name from the header.
With my code in [1], we could compare both on a -dev instance and check
if the names are actually identical. I'll send a MR for the app part.
>
> Of course it might be that apt would be unhappy about that, no idea.
As the internal redirect was broken and no one complained, I don't
think this is a problem. But it broke apt-cacher-ng [1].
>
> I understand that the motivation was to get the qps down. I think the
> more useful metric here would be to avoid the spinning rust seek
> that's
> required to serve the file - given that the request itself is kinda
> cheap for snapshot (AIUI). OTOH hot files are probably just going to
> be
> in the page cache anyway (even though that one is thrown away on
> reboot).
Agree.
Best regards,
Felix Moessbauer
>
> Kind regards
> Philipp Kern
>
> [1] For a given version, given that the version is part of the deb
> etc.
> [2]
> https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition#as_a_response_header_for_the_main_body
--
Siemens AG, Technology
Linux Expert Center
Reply to: