rust-tokenizers and tokenizer
These are competitors offering overlapping functionality, as both implement BPE tokenization in Rust, though A provides a more comprehensive multi-algorithm tokenizer suite with significantly greater adoption and maintenance.
About rust-tokenizers
guillaume-be/rust-tokenizers
Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models
This is a high-performance library that helps developers prepare text for use with large language models, such as BERT, GPT, and RoBERTa. It takes raw text input and converts it into numerical tokens, which are then fed into machine learning models. The primary users are developers building applications that process natural language, such as chatbots, sentiment analysis tools, or machine translation systems.
About tokenizer
Usama3627/tokenizer
Implementation of BPE Tokenizer in Rust
Scores updated daily from GitHub, PyPI, and npm data. How scores work