BPE Tokenizers LLM Tools
Implementations and variants of Byte Pair Encoding (BPE) tokenizers across programming languages and scripts. Includes language-specific BPE tokenizers, optimized BPE libraries, and BPE algorithm improvements. Does NOT include other tokenization methods (SentencePiece, WordPiece) unless BPE is the primary focus, general NLP pipelines, or LLM frameworks that merely use tokenizers.
There are 14 bpe tokenizers tools tracked. The highest-rated is eliben/go-sentencepiece at 41/100 with 47 stars.
Get all 14 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=bpe-tokenizers&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
eliben/go-sentencepiece
Go implementation of the SentencePiece tokenizer |
|
Emerging |
| 2 |
sefineh-ai/Amharic-Tokenizer
Syllable-aware BPE tokenizer for the Amharic language (አማርኛ) – fast,... |
|
Emerging |
| 3 |
mdabir1203/BPE_Tokenizer_Visualizer
A Visualizer to check how BPE Tokenizer in an LLM Works |
|
Experimental |
| 4 |
U4RASD/r-bpe
R-BPE: Improving BPE-Tokenizers with Token Reuse |
|
Experimental |
| 5 |
franciszekparma/GBPET
GPT-style language model with Byte Pair Encoding tokenizer, built from... |
|
Experimental |
| 6 |
jmaczan/bpe-tokenizer
Byte-Pair Encoding tokenizer for training large language models on huge datasets |
|
Experimental |
| 7 |
vforteli/WordPieceTokenizer
WordPiece tokenizer for dotnet (eg with ML.Net) |
|
Experimental |
| 8 |
BobMcDear/minbpe-hs
Byte-level byte pair encoding (BPE) in Haskell |
|
Experimental |
| 9 |
BlackNinjaKR/BPE_BytePairEncoding
An implementation of Byte Pair Encoding (BPE) |
|
Experimental |
| 10 |
sajjadh47/bpe-encoder-php
BPE (Byte-Pair Encoding) Encoder Decoder for OpenAI's GPT-2 / GPT-3... |
|
Experimental |
| 11 |
jmaczan/bpe.c
High performance Byte-Pair Encoding tokenizer for large language models |
|
Experimental |
| 12 |
taabishhh/LLM_Preprocessing
This project implements a Byte Pair Encoding (BPE) tokenization approach... |
|
Experimental |
| 13 |
ademyanchuk/minbpe
Educational reimplementation of Byte Pair Encoding (BPE) with regex... |
|
Experimental |
| 14 |
anperrone/minbpe
This crate is a rust porting of Andrej Karpathy implementation of Byte Pair... |
|
Experimental |