huggingface/tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

90
/ 100
Verified

Implemented in Rust with Python/Node.js/Ruby bindings, it supports BPE, WordPiece, and Unigram tokenization algorithms with integrated normalization that tracks character-level alignment to original text. The library handles full preprocessing pipelines including truncation, padding, and special token injection, enabling both vocabulary training and inference through a unified modular API.

10,520 stars and 129,702,376 monthly downloads. Used by 122 other packages. Actively maintained with 33 commits in the last 30 days. Available on PyPI and npm.

Maintenance 20 / 25
Adoption 25 / 25
Maturity 25 / 25
Community 20 / 25

How are scores calculated?

Stars

10,520

Forks

1,051

Language

Rust

License

Apache-2.0

Last pushed

Feb 28, 2026

Monthly downloads

129,702,376

Commits (30d)

33

Dependencies

14

Reverse dependents

122

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/huggingface/tokenizers"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.