On 05/07/2019 10:27 weasel@debian.org wrote, [snip]
There are two parts to the snapshot thing, each with its own resource constraints. (a) On is everything that goes to the database. Which is pretty much every request except for see (b). Things have gotten somewhat better since we moved the DB for the secondary snapshot instance to a new host, but it's probably still not happy to be hammered.
[snip]
These requests are bound by database latency, and also number of concurrent requests to the DBMS. Further, since the pooling class in use is not exactly great, once a certain number of requests are in flight, things just fall over and everybody starts gettings 503s. Don't overload the DB :)
So I guess pretty much every request to the machine readable interface hits the database. How about if I did something like, make a request, time how long it takes, wait 4 times that before making the next request? Does that seem a reasonable place to start to avoid breaking/abusing the system? No parallel requests, just a single thread, with appropriate retries and backoff in the event of failure.
This is very useful to know. I see you have some iptables limits on the number of connections from each IP but if I'm only downloading one file at a time with no concurrency hopefully I won't trigger them.(b) The only requests that do not hit the DB are requests to https://urldefense.proofpoint.com/v2/url?u=https-3A__snapshot.debian.org_file_&d=DwIBAg&c=yzoHOc_ZK-sxl-kfGNSEvlJYanssXN3q-lhj0sp26wE&r=8D-NmPUqjigQa5eRXEy3duNXP_ANo_zcQJg1uvF7OSQ&m=6FUjk9CjdXM2MkpASWIWmNXXe1Pz9rMeQHemjWV4EQM&s=2r9Vp9Dq0T00mL5UY6Sk-8aQ5OWr6WJrSm0m9k-5r-Y&e=<sha1sum of file> Those are cheap(ish). They are static files and apache fetches them directly from disk (NFS, but still). I wouldn't worry too much about making a lot of them. Maybe not concurrently, but fetching them fast and sustained shouldn't cause too many issues. If things fail, retry slowly?
When I last tried accessing snapshot from AWS I found it was blocked. I'd really like to do this in AWS, maybe I could get around this by using IPv6 but that probably counts as cheating right?
Thanks for your reply and for forwarding this email to the list. You've been very helpful already.
Cheers, Paul