[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [ITR] templates://htdig/{templates}



Christian Perrier wrote:
> The first step of the process is to review the debconf source
> template file(s) of htdig. This review will start on Friday, December 21, 2007, or
> as soon as you acknowledge this mail with an agreement for us to
> carry out this process.

I'll be away from keyboard for the next week, so I'll share my rough
notes in advance.                              

The package description needs quite a lot of rephrasing.  For a
start, its short description:

 -Description: WWW search system for an intranet or small internet   

ht://Dig is precisely not a World Wide Web search engine - it's a
local website search engine.  And what's a "small internet"? 

 +Description: web search engine for intranets
 - The ht://Dig system is a complete World Wide Web indexing and searching
 + The ht://Dig system is a complete web indexing and searching
   system for a small domain or intranet. This system is not meant to
   replace the need for powerful internet-wide search systems like Lycos,

(Dated - these days Lycos is a portal rather than a search engine)

   Google, or Yahoo!. Instead it is meant to cover the search needs of a
   single company, campus, or even a particular subsection of a website.
   .
   As opposed to some WAIS-based or web-server based search engines,

ht://Dig isn't opposed to WAIS, and "-based" is just fog as usual.
I'd boil it down to "Unlike some WAIS or web search engines" - but
then I wonder about the claim it's leading into:

   ht://Dig can span several web servers at a site. The type of these
   different web servers doesn't matter as long as they understand the
   HTTP 1.0 protocol.

Does ht://Dig really have rivals that can only index one server?
Are there web servers that still don't support HTTP 1.0?  Perhaps
these "features" should be retired into the bulleted feature-list.

The list's bullet syle should be standardised, but I'll take that
part for granted.

 -    * Intranet searching
 -    * It is free

 -    * Full source code included

 -    * Full support for the ISO-Latin-1 character set

Cut these non-features (what's htdig doing with ja.po and ru.po
files etcetera if it can't even handle š or €?).  Perhaps replace
them with:

 + - indexing of any number of unrelated web servers;

Standardising on noun phrases:

 -    * Robot exclusion is supported         
 + - robot exclusion support;

 -    * Keywords can be added to HTML documents         
 +  - keyword tagging of HTML documents;

 -    * A Protected server can be indexed         
 +  - indexing of protected servers;

 -    * The depth of the search can be limited         
 +  - configurable-depth searches;

Then the trailing caveat:

 - Please note that ht://Dig is a resource-hog, with respect to processor usage,
 - when indexing.         
 - .
 - Disk space requirements:         
 - .         
 - 13.000 documents indexed:      150MB disk space with a 'wordlist database'
 -                                93MB disk space without a 'wordlist'

The first half is subtly bad en_US; the second half has a blatantly
wrong $LC_NUMERIC! 

+ Please note that ht://Dig indexing is processor-intensive; and its disk
+ space requirements are approximately 12kB per document indexed (so e.g.
+ 13,000 documents indexed = 150MB with a wordlist database, 93MB without).

-- 
JBR	with qualifications in linguistics, experience as a Debian
	sysadmin, and probably no clue about this particular package


Reply to: