Re: [ITR] templates://htdig/{templates}
Christian Perrier wrote:
> The first step of the process is to review the debconf source
> template file(s) of htdig. This review will start on Friday, December 21, 2007, or
> as soon as you acknowledge this mail with an agreement for us to
> carry out this process.
I'll be away from keyboard for the next week, so I'll share my rough
notes in advance.
The package description needs quite a lot of rephrasing. For a
start, its short description:
-Description: WWW search system for an intranet or small internet
ht://Dig is precisely not a World Wide Web search engine - it's a
local website search engine. And what's a "small internet"?
+Description: web search engine for intranets
- The ht://Dig system is a complete World Wide Web indexing and searching
+ The ht://Dig system is a complete web indexing and searching
system for a small domain or intranet. This system is not meant to
replace the need for powerful internet-wide search systems like Lycos,
(Dated - these days Lycos is a portal rather than a search engine)
Google, or Yahoo!. Instead it is meant to cover the search needs of a
single company, campus, or even a particular subsection of a website.
.
As opposed to some WAIS-based or web-server based search engines,
ht://Dig isn't opposed to WAIS, and "-based" is just fog as usual.
I'd boil it down to "Unlike some WAIS or web search engines" - but
then I wonder about the claim it's leading into:
ht://Dig can span several web servers at a site. The type of these
different web servers doesn't matter as long as they understand the
HTTP 1.0 protocol.
Does ht://Dig really have rivals that can only index one server?
Are there web servers that still don't support HTTP 1.0? Perhaps
these "features" should be retired into the bulleted feature-list.
The list's bullet syle should be standardised, but I'll take that
part for granted.
- * Intranet searching
- * It is free
- * Full source code included
- * Full support for the ISO-Latin-1 character set
Cut these non-features (what's htdig doing with ja.po and ru.po
files etcetera if it can't even handle š or €?). Perhaps replace
them with:
+ - indexing of any number of unrelated web servers;
Standardising on noun phrases:
- * Robot exclusion is supported
+ - robot exclusion support;
- * Keywords can be added to HTML documents
+ - keyword tagging of HTML documents;
- * A Protected server can be indexed
+ - indexing of protected servers;
- * The depth of the search can be limited
+ - configurable-depth searches;
Then the trailing caveat:
- Please note that ht://Dig is a resource-hog, with respect to processor usage,
- when indexing.
- .
- Disk space requirements:
- .
- 13.000 documents indexed: 150MB disk space with a 'wordlist database'
- 93MB disk space without a 'wordlist'
The first half is subtly bad en_US; the second half has a blatantly
wrong $LC_NUMERIC!
+ Please note that ht://Dig indexing is processor-intensive; and its disk
+ space requirements are approximately 12kB per document indexed (so e.g.
+ 13,000 documents indexed = 150MB with a wordlist database, 93MB without).
--
JBR with qualifications in linguistics, experience as a Debian
sysadmin, and probably no clue about this particular package
Reply to: