huggingface/tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Implemented in Rust with Python/Node.js/Ruby bindings, it supports BPE, WordPiece, and Unigram tokenization algorithms with integrated normalization that tracks character-level alignment to original text. The library handles full preprocessing pipelines including truncation, padding, and special token injection, enabling both vocabulary training and inference through a unified modular API.
10,520 stars and 129,702,376 monthly downloads. Used by 122 other packages. Actively maintained with 33 commits in the last 30 days. Available on PyPI and npm.
Stars
10,520
Forks
1,051
Language
Rust
License
Apache-2.0
Category
Last pushed
Feb 28, 2026
Monthly downloads
129,702,376
Commits (30d)
33
Dependencies
14
Reverse dependents
122
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/huggingface/tokenizers"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related models
megagonlabs/ginza-transformers
Use custom tokenizers in spacy-transformers
Kaleidophon/token2index
A lightweight but powerful library to build token indices for NLP tasks, compatible with major...
Hugging-Face-Supporter/tftokenizers
Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels
NVIDIA/Cosmos-Tokenizer
A suite of image and video neural tokenizers
wangcongcong123/ttt
A package for fine-tuning Transformers with TPUs, written in Tensorflow2.0+