Text Tokenization Libraries ML Frameworks
Language processing tools that convert text into tokens for NLP and ML models. Includes tokenizers across multiple programming languages and implementations. Does NOT include general text processing, speech tokenization, or vectorization/embedding systems.
There are 20 text tokenization libraries frameworks tracked. The highest-rated is SauravP97/hf-tokenizer-visualizer at 35/100 with 2 stars and 193 monthly downloads.
Get all 20 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=text-tokenization-libraries&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
SauravP97/hf-tokenizer-visualizer
Visualize HuggingFace Byte-Pair Encoding (BPE) Tokenizer encoding process |
|
Emerging |
| 2 |
DePasqualeOrg/swift-tiktoken
A pure Swift implementation of OpenAI's tiktoken tokenizer |
|
Emerging |
| 3 |
Usama3627/tokenizer
Implementation of BPE Tokenizer in Rust |
|
Experimental |
| 4 |
andikaseptiadi/local-code-model
🛠️ Build a pure Go GPT-style transformer from scratch to grasp the... |
|
Experimental |
| 5 |
Scurrra/ubpe
Universal (general sequence) Byte-Pair Encoding |
|
Experimental |
| 6 |
twinnydotdev/toxe
SentencePiece tokenizer for cross-encoders |
|
Experimental |
| 7 |
jrajath94/bpe-tokenizer
BPE and WordPiece tokenization from scratch — clean implementations that... |
|
Experimental |
| 8 |
Example69420/splintr
🚀 Boost text processing speed with Splintr, a high-performance BPE tokenizer... |
|
Experimental |
| 9 |
C4AI/token-counter
Python library + CLI to count dataset tokens with HF tokenizers and export... |
|
Experimental |
| 10 |
Catmono/bpe-tokenizer-ts
🧠 Build and explore a minimal Byte Pair Encoding tokenizer in TypeScript,... |
|
Experimental |
| 11 |
DePasqualeOrg/swift-tokenizers
High-performance tokenizers |
|
Experimental |
| 12 |
jawrainey/hfta
Reference implementation: run any huggingface tokenizer in Android (rust). |
|
Experimental |
| 13 |
rjmacarthy/string-tokeniser
An implementation of Keras Tokenizer in JavaScript. |
|
Experimental |
| 14 |
DHRUVCHARNE/bpe-tokenizer-ts
From-scratch Byte Pair Encoding (BPE) tokenizer in TypeScript using Bun |
|
Experimental |
| 15 |
unixpickle/tweetenc
An auto-encoder for tweets |
|
Experimental |
| 16 |
briesearch/token-masks
Masked language model with Positional & One-Hot encoding - built using Aurora |
|
Experimental |
| 17 |
rekram1-node/tokenizer
Natural Language Processing (NLP) Tokenization Libary designed for English.... |
|
Experimental |
| 18 |
brightjonathan/BPE-Stanford
We train and implement a byte-level byte-pair encoding (BPE) tokenizer. In... |
|
Experimental |
| 19 |
sumony2j/Simple-BPE-Tokenizer
A pure Python implementation of Byte Pair Encoding (BPE) tokenizer. Train on... |
|
Experimental |
| 20 |
toprakdeviren/gpu-bpe
GPU-accelerated Byte Pair Encoding in the browser via WebGPU compute shaders |
|
Experimental |