Bug#375733: O: htdig -- WWW search system for an intranet or small internet
Package: wnpp
Severity: normal
I intend to orphan the htdig package. My maintainership has not been a
promising one. I laso lack the time to do what I would consider helpful
development work to the community.
Whoever picks up the package should be aware of the fact that this is a
package requiring a lot of work (to even get to a usable state).
Upstream consists of a malinglist; the people on the list seem quite
willing to help, but have similar problems with manpower so many
projects face.
In the hope this will be useful to help the community
Robert Ribnitz
The package description is:
The ht://Dig system is a complete world wide web indexing and searching
system for a small domain or intranet. This system is not meant to
replace the need for powerful internet-wide search systems like Lycos,
Infoseek, Webcrawler and AltaVista. Instead it is meant to cover the
search needs for a single company, campus, or even a particular sub
section of a web site.
.
As opposed to some WAIS-based or web-server based search engines,
ht://Dig can span several web servers at a site. The type of these different
web servers doesn't matter as long as they understand the HTTP 1.0
protocol.
.
Features:
* Intranet searching
* It is free
* Robot exclusion is supported
* Boolean expression searching
* Configurable search results
* Fuzzy searching
* Searching of HTML and text files
* Keywords can be added to HTML documents
* Email notification of expired documents
* A Protected server can be indexed
* Searches on subsections of the database
* Full source code included
* The depth of the search can be limited
* Full support for the ISO-Latin-1 character set
.
Disk space requirements:
.
The search engine will require lots of disk space to store its
databases. Unfortunately, there is no exact formula to compute the
space requirements. It depends on the number of documents you are
going to index but also on the various options you use. To give you
an idea of the space requirements, here is what I have deduced from
our own database size at San Diego State University.
.
If you keep around the wordlist database (for update digging instead
of initial digging) I found that multiplying the number of documents
covered by 12,000 will come pretty close to the space required.
.
We have about 13,000 documents: 150MB index size with a 'wordlist' database
93MB index size without a 'wordlist' database
.
The package is available in two varieties, the 'stable', well-tested version
(this one) and a less tested version (as 'htdig3.2').
-- System Information:
Debian Release: testing/unstable
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: i386 (i686)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.16-1-686-smp
Locale: LANG=de_AT@euro, LC_CTYPE=de_AT@euro (charmap=ISO-8859-15)
Reply to: