RFP: r-cran-tm -- GNU R package for text mining applications (fwd)
Hi.
I submitted this RFP (see below) a short while ago, but I included incorrect
addesses for the X-Debbugs-CC. The message with the assigned number is:
http://bugs.debian.org/660304
It would be great to have this packaged in Debian and sorry for those that
may actually happen to receive this message more than once.
Thanks,
Rogério Brito.
----- Forwarded message from Rogério Brito <rbrito@ime.usp.br> -----
Date: Fri, 17 Feb 2012 23:57:17 -0200
From: Rogério Brito <rbrito@ime.usp.br>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: RFP: r-cran-tm -- GNU R package for text mining applications
User-Agent: Mutt/1.5.21 (2010-09-15)
Message-ID: <20120218015715.GA13583@ime.usp.br>
Package: wnpp
Severity: wishlist
* Package name : r-cran-tm
Version : 0.5-7.1
Upstream Author : Ingo Feinerer <feinerer@logic.at>
* URL : http://tm.r-forge.r-project.org/
* License : GPL-3+
Programming Lang: R
Description : GNU R package for text mining
The tm package offers functionality for managing text documents, abstracts
the process of document manipulation and eases the usage of heterogeneous
text formats in R. The package has integrated database backend support to
minimize memory demands. An advanced meta data management is implemented for
collections of text documents to alleviate the usage of large and with meta
data enriched document sets.
.
With the package ships native support for handling the Reuters-21578 data
set, Gmane RSS feeds, e-mails, and several classic file formats (e.g. plain
text, CSV text, or PDFs).
.
The data structures and algorithms can be extended to fit custom demands,
since the package is designed in a modular way to enable easy integration of
new file formats, readers, transformations and filter operations.
.
tm provides easy access to preprocessing and manipulation mechanisms such as
whitespace removal, stemming, or conversion between file formats. Further a
generic filter architecture is available in order to filter documents for
certain criteria, or perform full text search. The package supports the
export from document collections to term-document matrices, and string
kernels can be easily constructed from text documents.
---
I am in the process of reviewing O'Reilly's book "Machine Learning for
Email".
With the recent uploads of gglib2 and plyr, this is the last package that is
needed for all packages used by the book to be available officially on
Debian (and, I hope, in short time, on popular derivatives like Ubuntu and
Linux Mint).
Regards,
--
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br
----- End forwarded message -----
--
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br
Reply to: