Re: Connecting to snapshot.debian.org failed: Connection refused
Dear people related to debian-snapshot,
first of all: thank you, it has been very handy over time for
manual checks and traceability as well as to reproduce builds.
Also, please excuse the loss of threading, but I wasn't subscribed
to the ML; just noticed that the issue had been raised before and
replicated the subject.
On Mon, May 27, 2019 at 9:43 AM Peter Palfrader
<weasel@debian.org> wrote:
Unfortunately, you don't mention your IP address.
snapshot is currently not great about dealing with lots of
requests, so
clients that ask it lots of things repeatedly bring it down,
breaking it
for everybody. It's not a normal webserver serving static
things from a
filesystem. It's more involved to deal with all the data.
As such, we block clients that appear abusive and take snapshot
down
for everyone. Some blocks are manual, some automatic.
Building CDs or repeatedly building CI images is probably not
something
that snapshot can deal with at the moment.
I was also hit by this, while trying to do something with the
machine readable API.
May I suggest: it would be interesting to add a note to the API
documentation site?
Something like "if you plan on using this for automated bulk
queries, please don't: get in touch with us before at
debian-snapshot@lists.debian.org"
That would keep well-intended people that have the wrong
expectation when reading that page from breaking things / being
blocked.
I tried to add a MR on salsa, but that appears to be disabled for
that team / project.
Here is the commit:
https://salsa.debian.org/evilham-guest/snapshot/commit/503e16c718ec5ad71388591682969a9cf529e428
Now my actual question is along the lines of what Gonzalo wrote
last Tuesday:
https://lists.debian.org/debian-snapshot/2019/05/msg00007.html
How would I (or anyone else) go around using the data provided by
the Snapshot API in a somewhat reliable fashion that does not
break things for everyone?
I guess it would be interesting to find a way and document it,
maybe other people have tried to do interesting things on top of
it and never reached out after failing.
My use-case goes along the lines of:
1. Make bunch of requests to determine which .debs are needed
2. Use https://snapshot.debian.org/file/<hash> to download them
AFAIU, number 1 is an issue, but 2 isn't.
If it's relevant: in my case number 2 is up to 1G transfer each
week, result of older runs are to be cached.
In order to solve 1, I'd be OK with artificially introducing
delays and still hitting snapshot.debian.org, would need to know a
target limit rate.
Another option, which may not be feasible, would be to make the db
available for download and give people the ability to process that
on their own; is a db dump (without the packages) huge?
Is there a better option out there I am not seeing?
Thank you, and sorry if this is already answered somewhere, I
didn't find it; if pointed in the right direction would try to
improve its visibility for future API users.
--
Evilham
Reply to: