[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [OT] robots.txt creation script?



on Thu, Mar 11, 2004 at 10:29:24AM -0700, Monique Y. Herman (spam@bounceswoosh.org) wrote:
> Hi all!
> 
> I've recently developed an interest in preventing spiders from accessing
> certain areas of my site ... but as near as I can tell, robots.txt is
> pretty stupid.  It only lets you *disallow*, whereas it would be a lot
> more sensible for me to specify what I want to *allow*.
> 
> I was thinking I might hack up a little script to generate a robots.txt
> file that disallows everything except the files I've listed, but first,
> has anyone already done this or seen this done?  I'd hate to reinvent
> the wheel =)
> 
> In my ideal world, robots.txt wouldn't require you to call out all of
> the "hidden" directories on your site ... *sigh* ... 

RTFM WRT htaccess.


As a note:  I found recently that a site I wanted to reference was no
longer online (actually, I'd known this for a while), but *also* had a
robots.txt prohibiting access.  Which meant that the Internet Archive
(http://www.archive.org/) didn't provide access to old views of the
site.

I asked the site owner if he'd modify the robots.txt to allow for
display.  Well, turns out IA says that having a robots.txt will remove
the site from the archive....  But we ran with it anyway, and he removed
robots.txt from the site.

IA had the full archived site online the same day.


Short form of story:  don't trust robots.txt for keeping people out of
your site.  At best, it will restrict well-behaved robots from trawling
potentially large parts of your site (with associated bandwidth costs).
It's not going to assure that _no_ robots crawl, or that they don't keep
copies.

For that, you need access control.  And people you trust with access.

There's an old saying about how three people can keep a secret that I'd
tell you here, but both the other people who knew it are dead.


Peace.

-- 
Karsten M. Self <kmself@ix.netcom.com>        http://kmself.home.netcom.com/
 What Part of "Gestalt" don't you understand?
    Americans [...] need to watch what they say.
    -- Ari Fleischer, White House Press Secretary
       http://www.whitehouse.gov/news/releases/2001/09/20010926-5.html

Attachment: signature.asc
Description: Digital signature


Reply to: