tokenizers and gotokenizers
These are ecosystem siblings: the Go implementation provides language-specific bindings for tokenizer algorithms that are standardized and popularized by the Rust-based reference implementation, allowing Go developers to use the same tokenization logic in production environments.
About tokenizers
huggingface/tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Implemented in Rust with Python/Node.js/Ruby bindings, it supports BPE, WordPiece, and Unigram tokenization algorithms with integrated normalization that tracks character-level alignment to original text. The library handles full preprocessing pipelines including truncation, padding, and special token injection, enabling both vocabulary training and inference through a unified modular API.
About gotokenizers
nlpodyssey/gotokenizers
Go implementation of today's most used tokenizers
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work