[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#814250: ITP: colibri-core -- Colibri Core is a Natural Language Processing tool to quickly and efficiently count and extract patterns from large corpus data.



Package: wnpp
Severity: wishlist
Owner: proycon <proycon@anaproy.nl>

* Package name    : colibri-core
  Version         : 2.1.3
  Upstream Author : Maarten van Gompel <proycon@anaproy.nl>
* URL             : https://proycon.github.io/colibri-core/
* License         : GPL-3
  Programming Lang: C++, Python
  Description     : Colibri Core is a Natural Language Processing tool and library to quickly and efficiently count and extract patterns from large corpus data.

Colibri Core is software consisting of command line tools as well as
programming libraries for C++ and Python to quickly and efficiently count and
extract patterns from large corpus data, to extract various statistics on the
extracted patterns, and to compute relations between the extracted patterns.

The employed notion of pattern or construction encompasses ngrams, skipgrams,
and flexgrams. Though, n-gram extraction may seem fairly trivial at first,
simple approachs place an unnecessarily high demand on memory resources, this
often becomes prohibitive if unleashed on large corpora. Colibri Core tries to
minimise these time & space requirements in several ways, and provides a
foundation for other tools to build on.

The package is to be maintained in the Debian Science packaging team. Hopefully
sponsored by Joost van Baal-Ilić? Extra help always welcome.

--

Maarten van Gompel
 Centre for Language Studies
 Radboud Universiteit Nijmegen

proycon@anaproy.nl
http://proycon.anaproy.nl
http://github.com/proycon

GnuPG key:  0x1A31555C  XMPP: proycon@anaproy.nl 
Telegram:   proycon     IRC: proycon (freenode)
Twitter:    https://twitter.com/proycon
Bitcoin:    1BRptZsKQtqRGSZ5qKbX2azbfiygHxJPsd 

Attachment: signature.asc
Description: signature


Reply to: