Re: lists.debian.org archives and google indexing
(please CC: me any replies, as I am not subscribed to debian-www)
On Fri, 15 Dec 2000, Thomas Guettler wrote:
> The googleBot caused to much traffic for the machine? Maybe we could
> ask google to scan the site with reduced bandwith. (Few pages per
> second or something like that). It is no good if the bot behaves like
> a DoS-attack.
Google has something about this in
http://www.google.com/intl/en_extra/help/faq.html#toofast
(quote:)
Help! Googlebot is crawling my site too fast. What can I do?
Please send an email to googlebot@google.com with the name of your site
and a detailed description of the problem. Please also include a portion
of the weblog that shows Google accesses, so we can track down the
problem more quickly on our end.
>From that, I gather that Googlebot is not supposed to go around DoSing
sites, and that what lists.debian.org experienced was caused by a problem.
Another solution that could be useful are the access throttling modules for
apache, which can be found at:
http://www.fremen.org/apache/mod_throttle_access.html
(throttling a certain resource globally)
and
http://www.snert.com/Software/mod_throttle/
(a very powerful and configurable thortlling module).
Maybe with the help of such modules, it would be possible to keep overeager
bots from being an annoyance.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
Reply to: