Systemcluster/kitoken
Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.
46 stars and 7,644 monthly downloads. Available on PyPI and npm.
Stars
46
Forks
1
Language
Rust
License
BSD-2-Clause
Category
Last pushed
Mar 10, 2026
Monthly downloads
7,644
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Systemcluster/kitoken"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related tools
google/sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
soaxelbrooke/python-bpe
Byte Pair Encoding for Python!
OpenNMT/Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
daac-tools/vibrato
🎤 vibrato: Viterbi-based accelerated tokenizer
taishi-i/toiro
A tool for comparing tokenizers