Tokenizer and kitoken

These are competitors offering overlapping tokenization functionality (BPE, SentencePiece support) across multiple programming languages, though kitoken's broader algorithm coverage (Unigram, WordPiece) and multi-language implementation (JavaScript, Python, Rust) versus OpenNMT/Tokenizer's focus on C++ performance differentiate their target use cases.

Tokenizer
59
Established
kitoken
58
Established
Maintenance 10/25
Adoption 10/25
Maturity 16/25
Community 23/25
Maintenance 13/25
Adoption 17/25
Maturity 25/25
Community 3/25
Stars: 330
Forks: 80
Downloads:
Commits (30d): 0
Language: C++
License: MIT
Stars: 46
Forks: 1
Downloads: 7,644
Commits (30d): 0
Language: Rust
License: BSD-2-Clause
No Package No Dependents
No Dependents

About Tokenizer

OpenNMT/Tokenizer

Fast and customizable text tokenization library with BPE and SentencePiece support

About kitoken

Systemcluster/kitoken

Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.

Scores updated daily from GitHub, PyPI, and npm data. How scores work