Re: Connecting to snapshot.debian.org failed: Connection refused

To: debian-snapshot@lists.debian.org
Subject: Re: Connecting to snapshot.debian.org failed: Connection refused
From: Evilham <contact@evilham.com>
Date: Wed, 05 Jun 2019 17:31:52 +0200
Message-id: <[🔎] f7167755-8035-480b-b071-20ec20b5b0c0@yggdrasil.evilham.com>

Dear people related to debian-snapshot,

first of all: thank you, it has been very handy over time formanual checks and traceability as well as to reproduce builds.

Also, please excuse the loss of threading, but I wasn't subscribedto the ML; just noticed that the issue had been raised before andreplicated the subject.

On Mon, May 27, 2019 at 9:43 AM Peter Palfrader<weasel@debian.org> wrote:

Unfortunately, you don't mention your IP address.
snapshot is currently not great about dealing with lots ofrequests, soclients that ask it lots of things repeatedly bring it down,breaking itfor everybody. It's not a normal webserver serving staticthings from a
filesystem.  It's more involved to deal with all the data.
As such, we block clients that appear abusive and take snapshotdown
for everyone.  Some blocks are manual, some automatic.
Building CDs or repeatedly building CI images is probably notsomething
that snapshot can deal with at the moment.

I was also hit by this, while trying to do something with themachine readable API.

May I suggest: it would be interesting to add a note to the APIdocumentation site?Something like "if you plan on using this for automated bulkqueries, please don't: get in touch with us before atdebian-snapshot@lists.debian.org"

That would keep well-intended people that have the wrongexpectation when reading that page from breaking things / beingblocked.

I tried to add a MR on salsa, but that appears to be disabled forthat team / project.

Here is the commit:
 https://salsa.debian.org/evilham-guest/snapshot/commit/503e16c718ec5ad71388591682969a9cf529e428

Now my actual question is along the lines of what Gonzalo wrotelast Tuesday:

https://lists.debian.org/debian-snapshot/2019/05/msg00007.html

How would I (or anyone else) go around using the data provided bythe Snapshot API in a somewhat reliable fashion that does notbreak things for everyone?

I guess it would be interesting to find a way and document it,maybe other people have tried to do interesting things on top ofit and never reached out after failing.



My use-case goes along the lines of:
1. Make bunch of requests to determine which .debs are needed
2. Use https://snapshot.debian.org/file/<hash> to download them

AFAIU, number 1 is an issue, but 2 isn't.

If it's relevant: in my case number 2 is up to 1G transfer eachweek, result of older runs are to be cached.

In order to solve 1, I'd be OK with artificially introducingdelays and still hitting snapshot.debian.org, would need to know atarget limit rate.Another option, which may not be feasible, would be to make the dbavailable for download and give people the ability to process thaton their own; is a db dump (without the packages) huge?


Is there a better option out there I am not seeing?

Thank you, and sorry if this is already answered somewhere, Ididn't find it; if pointed in the right direction would try toimprove its visibility for future API users.

--
Evilham

Reply to:

Index(es):
- Date
- Thread