Text Tokenization Libraries ML Frameworks

Language processing tools that convert text into tokens for NLP and ML models. Includes tokenizers across multiple programming languages and implementations. Does NOT include general text processing, speech tokenization, or vectorization/embedding systems.

There are 20 text tokenization libraries frameworks tracked. The highest-rated is SauravP97/hf-tokenizer-visualizer at 35/100 with 2 stars and 193 monthly downloads.

Get all 20 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=text-tokenization-libraries&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 SauravP97/hf-tokenizer-visualizer

Visualize HuggingFace Byte-Pair Encoding (BPE) Tokenizer encoding process

35
Emerging
2 DePasqualeOrg/swift-tiktoken

A pure Swift implementation of OpenAI's tiktoken tokenizer

30
Emerging
3 Usama3627/tokenizer

Implementation of BPE Tokenizer in Rust

24
Experimental
4 andikaseptiadi/local-code-model

🛠️ Build a pure Go GPT-style transformer from scratch to grasp the...

24
Experimental
5 Scurrra/ubpe

Universal (general sequence) Byte-Pair Encoding

23
Experimental
6 twinnydotdev/toxe

SentencePiece tokenizer for cross-encoders

23
Experimental
7 jrajath94/bpe-tokenizer

BPE and WordPiece tokenization from scratch — clean implementations that...

22
Experimental
8 Example69420/splintr

🚀 Boost text processing speed with Splintr, a high-performance BPE tokenizer...

22
Experimental
9 C4AI/token-counter

Python library + CLI to count dataset tokens with HF tokenizers and export...

22
Experimental
10 Catmono/bpe-tokenizer-ts

🧠 Build and explore a minimal Byte Pair Encoding tokenizer in TypeScript,...

22
Experimental
11 DePasqualeOrg/swift-tokenizers

High-performance tokenizers

22
Experimental
12 jawrainey/hfta

Reference implementation: run any huggingface tokenizer in Android (rust).

22
Experimental
13 rjmacarthy/string-tokeniser

An implementation of Keras Tokenizer in JavaScript.

21
Experimental
14 DHRUVCHARNE/bpe-tokenizer-ts

From-scratch Byte Pair Encoding (BPE) tokenizer in TypeScript using Bun

19
Experimental
15 unixpickle/tweetenc

An auto-encoder for tweets

14
Experimental
16 briesearch/token-masks

Masked language model with Positional & One-Hot encoding - built using Aurora

12
Experimental
17 rekram1-node/tokenizer

Natural Language Processing (NLP) Tokenization Libary designed for English....

12
Experimental
18 brightjonathan/BPE-Stanford

We train and implement a byte-level byte-pair encoding (BPE) tokenizer. In...

11
Experimental
19 sumony2j/Simple-BPE-Tokenizer

A pure Python implementation of Byte Pair Encoding (BPE) tokenizer. Train on...

11
Experimental
20 toprakdeviren/gpu-bpe

GPU-accelerated Byte Pair Encoding in the browser via WebGPU compute shaders

11
Experimental

Comparisons in this category