Dear Mentors, I'd like you to see HarvestMan (http://harvestman.freezope.org) (ITP bug #352012): Description: <quote> HarvestMan can be used to download files from websites, according to a number of user-specified rules. The latest version of HarvestMan supports as much as 60 plus customization options. HarvestMan is a console (command-line) application. HarvestMan is the only public-domain, multithreaded web-crawler program written in the Python language. HarvestMan is released under the GNU General Public License. </quote> The package is quite small and simple. The current tarball is available at http://download.berlios.de/harvestman/HarvestMan-1.4.6.tar.bz2 (< 100KB) and my diff is at: http://www.ee.iitm.ac.in/~ee03b091/debpkgs/python-harvestman_1.4.6-1.diff.gz The current status is, that I have a source package which generates python2.3-harvestman, python2.4-harvestman and python-harvestman. python-harvestman depends on the 2.3 version and ships with a symbolic link to the executable script present in the site-packages directory. I have also outlined the advantage of using the 2.4 package in README.Debian. I have also spent a LOT of time in writing a man page for the software from the documentation available online in a Word document. Now, the only issue which irks me is that the man page is written using the latest available docs, which is outdated. Though all stuff remains same, the configuration is now an XML file, while the documentation assumes that it is in a plain text file. Though one can adapt to the required settings from the manual, and the software has reverse compatibility, I just wanted to make sure. I have mentioned this in the README.Debian, and suggested ways of getting over this. If any other issues arise, please tell me, so that I can make corrections as appropriate. Thanks. Kumar -- Kumar Appaiah, 462, Jamuna Hostel, Indian Institute of Technology Madras, Chennai - 600 036
Attachment:
signature.asc
Description: Digital signature