BPE Tokenizers LLM Tools

Implementations and variants of Byte Pair Encoding (BPE) tokenizers across programming languages and scripts. Includes language-specific BPE tokenizers, optimized BPE libraries, and BPE algorithm improvements. Does NOT include other tokenization methods (SentencePiece, WordPiece) unless BPE is the primary focus, general NLP pipelines, or LLM frameworks that merely use tokenizers.

There are 14 bpe tokenizers tools tracked. The highest-rated is eliben/go-sentencepiece at 41/100 with 47 stars.

Get all 14 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=bpe-tokenizers&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 eliben/go-sentencepiece

Go implementation of the SentencePiece tokenizer

41
Emerging
2 sefineh-ai/Amharic-Tokenizer

Syllable-aware BPE tokenizer for the Amharic language (አማርኛ) – fast,...

39
Emerging
3 mdabir1203/BPE_Tokenizer_Visualizer

A Visualizer to check how BPE Tokenizer in an LLM Works

24
Experimental
4 U4RASD/r-bpe

R-BPE: Improving BPE-Tokenizers with Token Reuse

24
Experimental
5 franciszekparma/GBPET

GPT-style language model with Byte Pair Encoding tokenizer, built from...

23
Experimental
6 jmaczan/bpe-tokenizer

Byte-Pair Encoding tokenizer for training large language models on huge datasets

23
Experimental
7 vforteli/WordPieceTokenizer

WordPiece tokenizer for dotnet (eg with ML.Net)

22
Experimental
8 BobMcDear/minbpe-hs

Byte-level byte pair encoding (BPE) in Haskell

20
Experimental
9 BlackNinjaKR/BPE_BytePairEncoding

An implementation of Byte Pair Encoding (BPE)

16
Experimental
10 sajjadh47/bpe-encoder-php

BPE (Byte-Pair Encoding) Encoder Decoder for OpenAI's GPT-2 / GPT-3...

13
Experimental
11 jmaczan/bpe.c

High performance Byte-Pair Encoding tokenizer for large language models

12
Experimental
12 taabishhh/LLM_Preprocessing

This project implements a Byte Pair Encoding (BPE) tokenization approach...

12
Experimental
13 ademyanchuk/minbpe

Educational reimplementation of Byte Pair Encoding (BPE) with regex...

11
Experimental
14 anperrone/minbpe

This crate is a rust porting of Andrej Karpathy implementation of Byte Pair...

10
Experimental