[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: We need a Search Engine! Was: We need a FAQ.



On Tue, Aug 22, 2000 at 08:35:20PM -0600, montefin wrote:
[snip]
> When every other site worth a damn has a basic, simple,
> clear-up-the-obvious, Search Engine, http://www.debian.org/search
> complains it has not found a Search Engine worthy of itself.
[snip]
> Put an even so-so Search Engine at http://www.debian.org/search and you
> will see the traffic and inanity (including my own) on this list
> plummet!

yes! hear hear.  (i still think we need a newbie-centric FAQ which
contains mostly pointers to the existing documentation. help them
find what they're looking for!)

here's a post from the debian-www list from a month
ago; i'd like to see somenoe address this:

Erik Rossen wrote:
> On Tue, 25 Jul 2000, Craig Small wrote:
> > On Fri, Jul 21, 2000 at 09:31:56PM +0200, Erik Rossen wrote:
> > > entire website into a .deb package, searchable with htdig.  How many
> > > megabytes would that make?
> > Try Gig, like 4 Gig.
> 
> If I search on Altavista,
> 
> "url:www.debian.org" gives about 65,826 pages (say, 66,000 pages)
> 
> "url:www.debian.org AND NOT url:www.debian.org/Lists-Archives" gives about
> 9,334 (say, 9,300 pages)
> 
> Assuming that the 4GB number is due to the 66,000 pages, that makes an
> average of about 64kB per page.  This number seems to be a bit high for me
> - - I suspect that Altavista has been obeying robots.txt and that in reality
> there are many more pages.
> 
> Anyhow, assuming that one were to use htDig and budget 12kB per page for
> word indices (so that the database could be built incrementally), one
> gets:
> 
> For everything that AV has seen so far: 66,000 x 12kB = 792,000kB = 773MB
> 
> Ditto, minus the mail archives: 9,300 x 12kB = 111600kB = 109MB
> 
> Would someone with more experience than me tell us if these numbers pose
> any difficulties?  Unless there is a real need to keep all of the indices
> in RAM, shouldn't it be fairly cheap and easy to get this thing
> operational right now?  Even if the space required was one order of
> magnitude greater that what I've calculated?

> Erik Rossen                         ^
> rossen@freesurf.ch                 /e\
> http://www.multimania.com/rossen   ---   GPG key ID: 2935D0B9



Reply to: