tokenizers and gotokenizers

These are ecosystem siblings: the Go implementation provides language-specific bindings for tokenizer algorithms that are standardized and popularized by the Rust-based reference implementation, allowing Go developers to use the same tokenization logic in production environments.

tokenizers
90
Verified
gotokenizers
35
Emerging
Maintenance 20/25
Adoption 25/25
Maturity 25/25
Community 20/25
Maintenance 0/25
Adoption 8/25
Maturity 16/25
Community 11/25
Stars: 10,520
Forks: 1,051
Downloads: 129,702,376
Commits (30d): 33
Language: Rust
License: Apache-2.0
Stars: 44
Forks: 5
Downloads:
Commits (30d): 0
Language: Go
License: BSD-2-Clause
No risk flags
Stale 6m No Package No Dependents

About tokenizers

huggingface/tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Implemented in Rust with Python/Node.js/Ruby bindings, it supports BPE, WordPiece, and Unigram tokenization algorithms with integrated normalization that tracks character-level alignment to original text. The library handles full preprocessing pipelines including truncation, padding, and special token injection, enabling both vocabulary training and inference through a unified modular API.

About gotokenizers

nlpodyssey/gotokenizers

Go implementation of today's most used tokenizers

Scores updated daily from GitHub, PyPI, and npm data. How scores work