[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#837611: ITP: uctodata -- Data for ucto tokeniser



Package: wnpp
Severity: wishlist
Owner: Maarten van Gompel <proycon@anaproy.nl>

* Package name    : uctodata
  Upstream Author : Centre for Language and Speech Technology, Radboud University Nijmegen
* URL             : https://languagemachines.github.io/ucto
* License         : GPL-3
  Programming Lang: C++
  Description     : Data for Unicode Tokenizer

 Ucto can tokenize UTF-8 encoded text files (i.e. separate words from
 punctuation, split sentences, generate n-grams), and  offers several other
 basic preprocessing steps that make your text suited for further processing 
 such as indexing, part-of-speech tagging, or machine translation.

 This package provides necessary language-specific datafiles for running Ucto.

 Ucto was written by Maarten van Gompel and Ko van der Sloot.  Work on Ucto
 was funded by NWO, the Netherlands Organisation for Scientific Research,
 under the Implicit Linguistics project, the CLARIN-NL program, and the 
 CLARIAH project.

 Ucto is a product of the Centre of Language and Speech Technology (Radboud
 University Nijmegen), and previously the ILK Research Group (Tilburg
 University, The Netherlands).

----

This is a split from package ucto, which previously contained the data as well.

--

Maarten van Gompel
    Centre for Language Studies
    Radboud Universiteit Nijmegen

proycon@anaproy.nl
http://proycon.anaproy.nl
http://github.com/proycon

GnuPG key:  0x1A31555C  XMPP: proycon@anaproy.nl
Telegram:   proycon     IRC: proycon (freenode)
Twitter:    https://twitter.com/proycon
Bitcoin:    1BRptZsKQtqRGSZ5qKbX2azbfiygHxJPsd


Reply to: