Bug#879247: ITP: jieba -- Jieba Chinese text segmenter
Package: wnpp
Severity: wishlist
Owner: Yangfl <mmyangfl@gmail.com>
* Package name : jieba
Version : 0.39
Upstream Author : fxsjy
* URL : https://github.com/fxsjy/jieba
* License : MIT
Programming Lang: Python
Description : Jieba Chinese text segmenter
"Jieba" (Chinese for "to stutter")is a high-accuracy Chinese text segmenteran
based on HMM-model and Viterbi algorithm. It uses dynamic programming to find
the most probable combination based on the word frequency.
It supports three types of segmentation mode:
* Accurate Mode attempts to cut the sentence into the most accurate
segmentations, which is suitable for text analysis.
* Full Mode gets all the possible words from the sentence. Fast but not
accurate.
* Search Engine Mode, based on the Accurate Mode, attempts to cut long words
into several short words, which can raise the recall rate. Suitable for
search engines.
Traditional Chinese and customized dictionaries are also supported.
Reply to: