[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: question about snapshot.debian.org

Hello Peter,
thank you for the fast reply, I will have a look at the api, in the meantime if you configure a big TTL in varnish (only the main page is going to change I guess) I think I will stick with this implementation of the krawler because it is already part of a building process (i'm just adding a new url to parse)
Looking forward to your reply.

On Wed, Jul 13, 2016 at 10:54 AM, Peter Palfrader <weasel@debian.org> wrote:
On Wed, 13 Jul 2016, Luigi Tagliamonte wrote:

> I'm Luigi a sysadmin that work for sysdig <http://www.sysdig.org/>. I saw
> that you are the developer and maintainer of snapshot.debian.org, i'm
> writing a krawler to get all the old debian linux-image and linux-kernel
> deb packages to be able to pre-compile a kernel probe for the sysdig
> project.
> I noticed that the krawler is really slow and I did some profiling with
> cprofile (i'm using python).
> The most amount of time is spent in the open function to grub the HTML from
> the website.

The HTML stuff is autogenerated, and generating it is expensive.  There
is a machine usable API that gives you more info and is cheaper to

> I was wondering if there are actions on you side that you can take to
> improve the performances of the website like add a CDN or a varnish cache o
> spot some bottleneck that you may have on your side?

It's already supposed to be behind varnish, but it seems that one of the frontends
was not correctly configured.  Fix that, thanks.

> Here an example of the time spent from an AWS instance on us-east-1 region
> to grub a page from snapshot.debian.org (as you can see it took 20s):
> [root@ip-10-10-1-128 ~]# curl -o /dev/null
> http://snapshot.debian.org/package/linux/4.6~rc3-1~exp1/

Doing these requests automatically seems unwise.  Please use the API -
documentation link on the website.

You might also try to see if something that has already been written,
like debsnap, does not already serve your need.

                            |  .''`.       ** Debian **
      Peter Palfrader       | : :' :      The  universal
 https://www.palfrader.org/ | `. `'      Operating System
                            |   `-    https://www.debian.org/

“The only way to get smarter is by playing a smarter opponent.”

Reply to: