Mecanik/Tiny-BPE-Trainer
Lightweight, header-only Byte Pair Encoding (BPE) trainer in modern C++17. Produces HuggingFace-compatible vocabularies for transformers and integrates with Modern Text Tokenizer.
No commits in the last 6 months.
Stars
4
Forks
—
Language
C++
License
MIT
Category
Last pushed
Aug 08, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Mecanik/Tiny-BPE-Trainer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
huggingface/tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
megagonlabs/ginza-transformers
Use custom tokenizers in spacy-transformers
Kaleidophon/token2index
A lightweight but powerful library to build token indices for NLP tasks, compatible with major...
NVIDIA/Cosmos-Tokenizer
A suite of image and video neural tokenizers
Hugging-Face-Supporter/tftokenizers
Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels