On Wed, 13 Jul 2016, Luigi Tagliamonte wrote:
> I'm Luigi a sysadmin that work for sysdig <http://www.sysdig.org/>. I saw
> that you are the developer and maintainer of snapshot.debian.org, i'm
> writing a krawler to get all the old debian linux-image and linux-kernel
> deb packages to be able to pre-compile a kernel probe for the sysdig
> I noticed that the krawler is really slow and I did some profiling with
> cprofile (i'm using python).
> The most amount of time is spent in the open function to grub the HTML from
> the website.
The HTML stuff is autogenerated, and generating it is expensive. There
is a machine usable API that gives you more info and is cheaper to
> I was wondering if there are actions on you side that you can take to
> improve the performances of the website like add a CDN or a varnish cache o
> spot some bottleneck that you may have on your side?
It's already supposed to be behind varnish, but it seems that one of the frontends
was not correctly configured. Fix that, thanks.
> Here an example of the time spent from an AWS instance on us-east-1 region
> to grub a page from snapshot.debian.org (as you can see it took 20s):
> [root@ip-10-10-1-128 ~]# curl -o /dev/null
Doing these requests automatically seems unwise. Please use the API -
documentation link on the website.
You might also try to see if something that has already been written,
like debsnap, does not already serve your need.
| .''`. ** Debian **
Peter Palfrader | : :' : The universal
https://www.palfrader.org/ | `. `' Operating System
| `- https://www.debian.org/