[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RFP: r-cran-tm -- GNU R package for text mining applications (fwd)



Hi.

I submitted this RFP (see below) a short while ago, but I included incorrect
addesses for the X-Debbugs-CC. The message with the assigned number is:

    http://bugs.debian.org/660304

It would be great to have this packaged in Debian and sorry for those that
may actually happen to receive this message more than once.


Thanks,

Rogério Brito.

----- Forwarded message from Rogério Brito <rbrito@ime.usp.br> -----

Date: Fri, 17 Feb 2012 23:57:17 -0200
From: Rogério Brito <rbrito@ime.usp.br>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: RFP: r-cran-tm -- GNU R package for text mining applications
User-Agent: Mutt/1.5.21 (2010-09-15)
Message-ID: <20120218015715.GA13583@ime.usp.br>

Package: wnpp
Severity: wishlist

* Package name    : r-cran-tm
  Version         : 0.5-7.1
  Upstream Author : Ingo Feinerer <feinerer@logic.at>
* URL             : http://tm.r-forge.r-project.org/
* License         : GPL-3+
  Programming Lang: R
  Description     : GNU R package for text mining

 The tm package offers functionality for managing text documents, abstracts
 the process of document manipulation and eases the usage of heterogeneous
 text formats in R. The package has integrated database backend support to
 minimize memory demands. An advanced meta data management is implemented for
 collections of text documents to alleviate the usage of large and with meta
 data enriched document sets.
 .
 With the package ships native support for handling the Reuters-21578 data
 set, Gmane RSS feeds, e-mails, and several classic file formats (e.g. plain
 text, CSV text, or PDFs).
 .
 The data structures and algorithms can be extended to fit custom demands,
 since the package is designed in a modular way to enable easy integration of
 new file formats, readers, transformations and filter operations.
 .
 tm provides easy access to preprocessing and manipulation mechanisms such as
 whitespace removal, stemming, or conversion between file formats. Further a
 generic filter architecture is available in order to filter documents for
 certain criteria, or perform full text search. The package supports the
 export from document collections to term-document matrices, and string
 kernels can be easily constructed from text documents.

---

I am in the process of reviewing O'Reilly's book "Machine Learning for
Email".

With the recent uploads of gglib2 and plyr, this is the last package that is
needed for all packages used by the book to be available officially on
Debian (and, I hope, in short time, on popular derivatives like Ubuntu and
Linux Mint).


Regards,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

----- End forwarded message -----

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br


Reply to: