[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

sysinf0 - website indexation



Hello,

I am working on a website[1], which purpose is let the visitor browse
a _virtual_ filesystem, made of all the files shipped in Debian
packages. Then view or compare the files.

The problem is that google will never finish indexing the 10 million
pages (not on my home DSL, at least)...

My first plan is to track unstable, then provide a kind of news feed for
search engines. [my DebCamp8 plan]

The second improvent, is to actualy prevent google from indexing useless
pages. The question is what pages are usefull, and which are useless ?
My current (quick) list is :
 ^/etc/.*$
 ^/var/lib/dpkg/.*$
 ^/usr/share/doc/[^/]*/[^/]*$

Any suggestion ?

Franklin

[1] http://sysinf0.klabs.be/


Reply to: