[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1019435: ITP: rapidfuzz -- rapid fuzzy string matching



Package: wnpp
Severity: wishlist
Owner: Julian Gilbey <jdg@debian.org>
X-Debbugs-Cc: debian-devel@lists.debian.org, debian-devel@lists.debian.org, debian-python@lists.debian.org

* Package name    : rapidfuzz
  Version         : 2.6.1
  Upstream Author : Max Bachmann <pypi@maxbachmann.de>
* URL             : https://github.com/maxbachmann/RapidFuzz
* License         : MIT
  Programming Lang: Python
  Description     : rapid fuzzy string matching

RapidFuzz is a fast string matching library for Python and C++, which
uses the string similarity calculations from
[FuzzyWuzzy](https://github.com/seatgeek/fuzzywuzzy).  However there
are a couple of aspects that set RapidFuzz apart from FuzzyWuzzy:
1) It is MIT licensed so it can be used whichever License you might want to choose for your project, while you're forced to adopt the GPL license when using FuzzyWuzzy
2) It provides many string_metrics like hamming or jaro_winkler, which are not included in FuzzyWuzzy
3) It is mostly written in C++ and on top of this comes with a lot of Algorithmic improvements to make string matching even faster, while still providing the same results. For detailed benchmarks check the [documentation](https://maxbachmann.github.io/RapidFuzz/fuzz.html)
4) Fixes multiple bugs in the `partial_ratio` implementation

This is a dependency of the latest upstream release of
python3-textdistance.

There are also two C++ libraries contained within this package,
managed as separate GitHub subrepositories.  One is rapidfuzz-cpp, and
I am not sure whether to bundle this as a single package or whether to
package this independently.  The other is taskflow
(https://github.com/taskflow/taskflow)), a C++ header-only package; I
think this should probably be packaged separately.

This package will be maintained within the Python Packaging Team.


Reply to: