Tokenizer and kitoken
These are competitors offering overlapping tokenization functionality (BPE, SentencePiece support) across multiple programming languages, though kitoken's broader algorithm coverage (Unigram, WordPiece) and multi-language implementation (JavaScript, Python, Rust) versus OpenNMT/Tokenizer's focus on C++ performance differentiate their target use cases.
Maintenance
10/25
Adoption
10/25
Maturity
16/25
Community
23/25
Maintenance
13/25
Adoption
17/25
Maturity
25/25
Community
3/25
Stars: 330
Forks: 80
Downloads: —
Commits (30d): 0
Language: C++
License: MIT
Stars: 46
Forks: 1
Downloads: 7,644
Commits (30d): 0
Language: Rust
License: BSD-2-Clause
No Package
No Dependents
No Dependents
About Tokenizer
OpenNMT/Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
About kitoken
Systemcluster/kitoken
Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work