Bug#1108022: ITP: librust-spm-precompiled-dev -- emulate https://github.com/google/sentencepiece Dart::DoubleArray struct and it's Normalizer
Package: wnpp
Severity: wishlist
Owner: Kohei Sendai <kouhei.sendai@gmail.com>
X-Debbugs-Cc: debian-devel@lists.debian.org, kouhei.sendai@gmail.com
* Package name : librust-spm-precompiled-dev
Version : 0.1.4
Upstream Contact: Nicolas Patry <patry.nicolas@protomail.com>
* URL : https://github.com/huggingface/spm_precompiled
* License : Apache-2.0
Programming Lang: Rust
Description : emulate https://github.com/google/sentencepiece Dart::DoubleArray struct and it's Normalizer
This crate aims to emulate https://github.com/google/sentencepiece Dart::DoubleArray struct and it's Normalizer. It's main intent is to be used with tokenizers that is a Rust library that aims to provide facilities to tokenize string for use with HuggingFace's transformers library
This crate is highly specialized and not intended for general use.
The core of the algorithm is to read spm's binary precompiled_charsmap
- This is highly related to tokenizers which is really important for
packaging vllm. Now, I'm planning to packaging vllm.
Reply to: