[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Search engine for documentation indexing?



Ian Zimmerman wrote:
I'd like to build an index of the documentation in /usr/share/doc,
but I am quite unhappy with the options I have tried so far:

1. dwww has a built in cgi for searching an index built by swish++.
Unfortunately swish++ indexing seems to take forever (it's described as
"lighting fast" on the upstream website, but I can't find the pictures
of flying pigs).  Also, using the built in dwww integration has the
disadvantage that only documents registered in doc-base are indexed,
which misses a lot of them.  On top of this swish++ shares the main
problem of

2. swish-e.  This looked very promising for a while, and I even wrote
a python module to wrap the API:
http://pypi.python.org/pypi?%3Aaction=search&term=pyswish&submit=search

... but it can't handle documents encoded other than ASCII and Latin-1
(in particular, it breaks on UTF-8 XHTML documents).  This is a
show-stopper.

3. xapian-omega.  This seems to be the one modern apps are migrating to,
I heard of the Gnus mail/newsreader acquiring a xapian based search
function.  But, out of the box it cannot index gzipped files (and most
documents in /usr/share/doc other that HTML pages are gzipped), and
there doesn't seem to be a way to add a user-defined filter either
to compensate for this (swish-e has user filters).

I can't be the only one looking for this, so what do other debianists do?


I use recoll and dwww but rely on recoll more and more.

Wayne


Reply to: