[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Search Engine



I've added debian-www to the reply since this is where most
discussions of this nature take place.

On Thu, Sep 23, 1999 at 09:56:34AM -0700, Jake Sheridan wrote:
> Hi guys,
> 
> I've been waiting to use the search engine at debian.org for a long
> long time.....is it possible to speed up the process somehow.....thanx
> 
> Jake
> 
> P.S. I think you have a great site though....just missing the crucial
> element of a search engine....!!!
> 
The problem is that none of the search programs we've checked to date have
all the features we are looking for:

- Free (as in DFSG)
- Able to handle large data sets (> 1 GB)
- Able to keep separate indexes and merge them (so we don't have to reindex
  previous months mail archives, but can simply merge those with the current
  month). Merging indexes should be fairly efficient (in most cases where
  merging is implemented, it is not).
- Able to search on specific parts of the data. For example, searching on
  subject or sender in mail. I don't care how this is implemented (through
  separate indices or through use of regex, e.g. /^Subject: .*How to get rich/)
  as long as it is possible.
- Able to index files locally, i.e.  without going through a web server.
- Able to search using regex (optional). Next down would be searching for
  simple phrases. At a minimum, the ability to match arbitrary word endings
  (the equivalent of /^keyword\w*/).

Here is what I've looked at so far:
htdig - can't index locally, too slow
mg - still evaluating
namazu - haven't really looked at
psearch - variant of isearch. promising, but still under development
swish++ - can't merge separate indices. great for straight html though
glimpse - non-free

Are we being unreasonable? I don't believe so since all the big search
engines obviously can handle large data sets, merge information on the
fly (at least I hope they don't reindex for every new page :) and
can handle searching on phrases (not necessarily regex though).

Hopefully putting more heads to this will help find a solution.

Jay Treacy


Reply to: