[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1018100: ITP: liblanguage-detector-java -- Language Detection Library for Java



Package: wnpp
Severity: wishlist
Owner: Markus Koschany <apo@debian.org>
X-Debbugs-Cc: debian-devel@lists.debian.org, apo@debian.org,debian-java@lists.debian.org

* Package name    : liblanguage-detector-java
  Version         : 0.6
  Upstream Author : Nakatani Shuyo, Francois ROLAND, Fabian Kessler,
                    Nicole Torres, Robert Theis
* URL             : https://github.com/optimaize/language-detector
* License         : Apache-2.0
  Programming Lang: Java
  Description     : Language Detection Library for Java

This software uses language profiles which were created based on
common text for each language. N-grams, a contiguous sequence of n
items from a given sample of text, were then extracted from that text
and stored in the profiles. When trying to figure out in what
language a certain text is written, the program goes through the same
process: It creates the same kind of n-grams of the input text. Then
it compares the relative frequency of them, and finds the language
that matches best. Currently 71 languages are supported.


Reply to: