tokenizers and tftokenizers
The first is a core tokenization library that the second wraps as TensorFlow SavedModels for serving, making them complements rather than competitors.
About tokenizers
huggingface/tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Implemented in Rust with Python/Node.js/Ruby bindings, it supports BPE, WordPiece, and Unigram tokenization algorithms with integrated normalization that tracks character-level alignment to original text. The library handles full preprocessing pipelines including truncation, padding, and special token injection, enabling both vocabulary training and inference through a unified modular API.
About tftokenizers
Hugging-Face-Supporter/tftokenizers
Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work