[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: lists.debian.org archives and google indexing



(please CC: me any replies, as I am not subscribed to debian-www)

On Fri, 15 Dec 2000, Thomas Guettler wrote:
> The googleBot caused to much traffic for the machine? Maybe we could
> ask google to scan the site with reduced bandwith. (Few pages per
> second or something like that). It is no good if the bot behaves like
> a DoS-attack.

Google has something about this in
http://www.google.com/intl/en_extra/help/faq.html#toofast

(quote:)

   Help! Googlebot is crawling my site too fast. What can I do?

   Please send an email to googlebot@google.com with the name of your site
   and a detailed description of the problem. Please also include a portion
   of the weblog that shows Google accesses, so we can track down the
   problem more quickly on our end.

>From that, I gather that Googlebot is not supposed to go around DoSing
sites, and that what lists.debian.org experienced was caused by a problem.

Another solution that could be useful are the access throttling modules for
apache, which can be found at:

http://www.fremen.org/apache/mod_throttle_access.html
(throttling a certain resource globally)

and

http://www.snert.com/Software/mod_throttle/
(a very powerful and configurable thortlling module).

Maybe with the help of such modules, it would be possible to keep overeager
bots from being an annoyance.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh



Reply to: