[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: FW: Accessing snapshot.debian.org packages



On 05/07/2019 10:27 weasel@debian.org wrote,

[snip]

There are two parts to the snapshot thing, each with its own resource
constraints.

(a) On is everything that goes to the database.  Which is pretty much
     every request except for see (b).  Things have gotten somewhat
     better since we moved the DB for the secondary snapshot instance
     to a new host, but it's probably still not happy to be hammered.
[snip]
     These requests are bound by database latency, and also number of
     concurrent requests to the DBMS.  Further, since the pooling class
     in use is not exactly great, once a certain number of requests are
     in flight, things just fall over and everybody starts gettings 503s.

     Don't overload the DB :)

So I guess pretty much every request to the machine readable interface hits the database. How about if I did something like, make a request, time how long it takes, wait 4 times that before making the next request? Does that seem a reasonable place to start to avoid breaking/abusing the system? No parallel requests, just a single thread, with appropriate retries and backoff in the event of failure.

(b) The only requests that do not hit the DB are requests to
     https://urldefense.proofpoint.com/v2/url?u=https-3A__snapshot.debian.org_file_&d=DwIBAg&c=yzoHOc_ZK-sxl-kfGNSEvlJYanssXN3q-lhj0sp26wE&r=8D-NmPUqjigQa5eRXEy3duNXP_ANo_zcQJg1uvF7OSQ&m=6FUjk9CjdXM2MkpASWIWmNXXe1Pz9rMeQHemjWV4EQM&s=2r9Vp9Dq0T00mL5UY6Sk-8aQ5OWr6WJrSm0m9k-5r-Y&e=<sha1sum of file>

     Those are cheap(ish).  They are static files and apache fetches them
     directly from disk (NFS, but still).  I wouldn't worry too much
     about making a lot of them.  Maybe not concurrently, but fetching
     them fast and sustained shouldn't cause too many issues.  If things
     fail, retry slowly?
This is very useful to know. I see you have some iptables limits on the number of connections from each IP but if I'm only downloading one file at a time with no concurrency hopefully I won't trigger them.

When I last tried accessing snapshot from AWS I found it was blocked. I'd really like to do this in AWS, maybe I could get around this by using IPv6 but that probably counts as cheating right?

Thanks for your reply and for forwarding this email to the list. You've been very helpful already.

Cheers,

Paul


Reply to: