[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RFS: python-harvestman - a multithreaded web crawler



(Though I realize that Debian Mentors is the right place to look for
sponsors, I thought that it wouldn't be wrong to look for someone
having interest on this list. Sorry if this is a mistake.)

I'd like you to see HarvestMan (http://harvestman.freezope.org) (ITP
bug #352012):

Description:
<quote>
HarvestMan can be used to download files from websites, according to a
number of user-specified rules. The latest version of HarvestMan
supports as much as 60 plus customization options. HarvestMan is a
console (command-line) application.

HarvestMan is the only public-domain, multithreaded web-crawler
program written in the Python language. HarvestMan is released under
the GNU General Public License.
</quote>

The package is quite small and simple. The current tarball is
available at
http://download.berlios.de/harvestman/HarvestMan-1.4.6.tar.bz2
(< 100KB)

and my diff is at:
http://www.ee.iitm.ac.in/~ee03b091/debpkgs/python-harvestman_1.4.6-1.diff.gz
and other files are in the same directory
http://www.ee.iitm.ac.in/~ee03b091/debpkgs/

The current status is, that I have a source package which generates
python2.3-harvestman, python2.4-harvestman and
python-harvestman. python-harvestman depends on the 2.3 version and
ships with a symbolic link to the executable script present in the
site-packages directory. I have also outlined the advantage of using
the 2.4 package in README.Debian.

I have also spent a LOT of time in writing a man page for the software
from the documentation available online in a Word document.

Now, the only issue which irks me is that the man page is written
using the latest available docs, which is outdated. Though all stuff
remains same, the configuration is now an XML file, while the
documentation assumes that it is in a plain text file. Though one can
adapt to the required settings from the manual, and the software has
reverse compatibility, I just wanted to make sure. I have mentioned
this in the README.Debian, and suggested ways of getting over this.

If any other issues arise, please tell me, so that I can make
corrections as appropriate.

Thanks.

Kumar

-- 
Kumar Appaiah,
462, Jamuna Hostel,
Indian Institute of Technology Madras,
Chennai - 600 036

Attachment: signature.asc
Description: Digital signature


Reply to: