[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Debian in Google's Index



On Sat, Mar 22, 2003 at 12:06:25PM +0100, Raphael Hertzog wrote:
> > > As you may know, Google's mission is to deliver the best search
> > > experience on the Internet by making the world's information universally
> > > accessible and useful.  Currently, our Google.com site is within the top
> > > 10 sites in all major markets worldwide with over 200 million searches
> > > per day.  
> > > 
> > > We believe that www.debian.org is a great site and have discovered that
> > > Google is currently blocked from crawling your site by the robot.txt
> > > that is on your site.  I believe we can drive a lot of traffic and
> > > awareness to your organization and would like to find a mutually
> > > beneficial way to work together. 
> > 
> > How's that? http://www.debian.org/robots.txt says only:
> > 
> > User-agent: *
> > Disallow: /security/
> > Disallow:
> 
> What's the purpose of the empty Disallow ?

To allow everything else. (I looked it up at somewhere at www.robotstxt.org
before posting :)

> > I'm not sure why we ban /security/, but otherwise it should be perfectly
> > possible to crawl the remaining 773 MB of www.debian.org...
> 
> I wanted to look at the CVS log to find a possible reason but this file
> is not managed by CVS. :-|

Oh, indeed, we should probably commit it then.

> I really don't see why we keep that Disallow, it's not a dynamic site
> with infinite recursion or anything like that. Sure it changes often but
> that's not a big problem imho ...

Yeah. I'm trying to think of the deep subliminal reason why the security
exclusion is there... :)

-- 
     2. That which causes joy or happiness.



Reply to: