[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Connecting to snapshot.debian.org failed: Connection refused

Dear people related to debian-snapshot,

first of all: thank you, it has been very handy over time for manual checks and traceability as well as to reproduce builds.

Also, please excuse the loss of threading, but I wasn't subscribed to the ML; just noticed that the issue had been raised before and replicated the subject.

On Mon, May 27, 2019 at 9:43 AM Peter Palfrader <weasel@debian.org> wrote:
Unfortunately, you don't mention your IP address.

snapshot is currently not great about dealing with lots of requests, so clients that ask it lots of things repeatedly bring it down, breaking it for everybody. It's not a normal webserver serving static things from a
filesystem.  It's more involved to deal with all the data.

As such, we block clients that appear abusive and take snapshot down
for everyone.  Some blocks are manual, some automatic.

Building CDs or repeatedly building CI images is probably not something
that snapshot can deal with at the moment.

I was also hit by this, while trying to do something with the machine readable API.

May I suggest: it would be interesting to add a note to the API documentation site? Something like "if you plan on using this for automated bulk queries, please don't: get in touch with us before at debian-snapshot@lists.debian.org"

That would keep well-intended people that have the wrong expectation when reading that page from breaking things / being blocked.

I tried to add a MR on salsa, but that appears to be disabled for that team / project.
Here is the commit:

Now my actual question is along the lines of what Gonzalo wrote last Tuesday:

How would I (or anyone else) go around using the data provided by the Snapshot API in a somewhat reliable fashion that does not break things for everyone?

I guess it would be interesting to find a way and document it, maybe other people have tried to do interesting things on top of it and never reached out after failing.

My use-case goes along the lines of:
1. Make bunch of requests to determine which .debs are needed
2. Use https://snapshot.debian.org/file/<hash> to download them

AFAIU, number 1 is an issue, but 2 isn't.
If it's relevant: in my case number 2 is up to 1G transfer each week, result of older runs are to be cached.

In order to solve 1, I'd be OK with artificially introducing delays and still hitting snapshot.debian.org, would need to know a target limit rate. Another option, which may not be feasible, would be to make the db available for download and give people the ability to process that on their own; is a db dump (without the packages) huge?

Is there a better option out there I am not seeing?

Thank you, and sorry if this is already answered somewhere, I didn't find it; if pointed in the right direction would try to improve its visibility for future API users.

Reply to: