Tokenization Libraries LLM Tools
Libraries and tools for tokenizing text using OpenAI's tiktoken encoding across multiple programming languages and platforms. Does NOT include general text processing, language models themselves, or token estimation approximations without full tokenization.
There are 45 tokenization libraries tools tracked. The highest-rated is lenML/tokenizers at 49/100 with 32 stars and 92,127 monthly downloads.
Get all 45 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=tokenization-libraries&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
lenML/tokenizers
a lightweight no-dependency fork from transformers.js (only tokenizers) |
|
Emerging |
| 2 |
aiqinxuancai/TiktokenSharp
Token calculation for OpenAI models, using `o200k_base` `cl100k_base`... |
|
Emerging |
| 3 |
pkoukk/tiktoken-go
go version of tiktoken |
|
Emerging |
| 4 |
tryAGI/Tiktoken
This project implements token calculation for OpenAI's gpt-4 and... |
|
Emerging |
| 5 |
dqbd/tiktokenizer
Online playground for OpenAPI tokenizers |
|
Emerging |
| 6 |
microsoft/Tokenizer
Typescript and .NET implementation of BPE tokenizer for OpenAI LLMs. |
|
Emerging |
| 7 |
samber/tiktoken-cli
🧮 CLI for counting tokens in files and directories using tiktoken |
|
Emerging |
| 8 |
aallam/ktoken
Kotlin multiplatform BPE tokenizer library for OpenAI models |
|
Emerging |
| 9 |
MichaelCurrin/token-translator
Convert the token input limits of LLMs like ChatGPT into real-world measures... |
|
Emerging |
| 10 |
geckse/n8n-nodes-gpt-tokenizer
n8n node for working with BPE Tokens with GPT in mind. |
|
Emerging |
| 11 |
dmitry-brazhenko/SharpToken
SharpToken is a C# library for tokenizing natural language text. It's based... |
|
Emerging |
| 12 |
AI21Labs/ai21-tokenizer
AI21's Jamba models tokenizers |
|
Emerging |
| 13 |
botisan-ai/gpt3-tokenizer
Isomorphic JavaScript/TypeScript Tokenizer for GPT-3 and Codex Models by OpenAI. |
|
Emerging |
| 14 |
Thibault00/runtoken
A blazing-fast BPE tokenizer for LLMs. Drop-in tiktoken replacement, 20-80x faster. |
|
Experimental |
| 15 |
zkry/tiktoken.el
tiktoken.el is an Emacs Lisp port of OpenAI's tiktoken library for BPE tokenization |
|
Experimental |
| 16 |
coder/ai-tokenizer
A faster than tiktoken tokenizer with first-class support for Vercel's AI SDK. |
|
Experimental |
| 17 |
unitythemaker/tokdu
tokdu (Token Disk Usage) is a terminal-based utility that helps you analyze... |
|
Experimental |
| 18 |
oelmekki/tiktoken-cli
Simple wrapper around tiktoken to use it in your favorite language. |
|
Experimental |
| 19 |
Darkatse/MikTik
A multi-model tokenizer, in Rust. |
|
Experimental |
| 20 |
qbit-ai/tokenx-rs
Rust port of johannschopplich/tokenx - Fast token count estimation for LLMs... |
|
Experimental |
| 21 |
CTCycle/TKBEN-tokenizers-benchmarker
Explore and benchmark public and custom tokenizers from HuggingFace using... |
|
Experimental |
| 22 |
AndresEspin1993/b2t-tokenizer
B2T - Tokenizer for the AI Systems. |
|
Experimental |
| 23 |
Dev-in-a-Box-Limited/TokenEvaluator.Net
TokenEvaluator.Net is a simple and useful library designed to measure and... |
|
Experimental |
| 24 |
rodneylab/tokenator
Count the number of tokens in an LLM prompt |
|
Experimental |
| 25 |
gemologic/carat
a quick cli tool to estimate token count |
|
Experimental |
| 26 |
kgruiz/PyTokenCounter
A simple Python library for tokenizing text and counting tokens. While... |
|
Experimental |
| 27 |
claylo/ah-ah-ah
VUN token! TWO tokens! Count all the beautiful tokens ... offline! Ah-ah-ah! |
|
Experimental |
| 28 |
peterheb/gotoken
Gotoken is a pure-Go implementation of the Python library openai/tiktoken. |
|
Experimental |
| 29 |
valmat/gpt-tokenator
GPT 3 tokens counter |
|
Experimental |
| 30 |
ziliwang/gpt_tokenizer
cpp roberta tokenzier for deploy using |
|
Experimental |
| 31 |
MrTechyWorker/chartokenizer
Chartokenizer is a Python package for basic character-level tokenization. It... |
|
Experimental |
| 32 |
agentstation/tokenizer
High-performance tokenizer implementations in Go with unified CLI. Features... |
|
Experimental |
| 33 |
n4ryn/genai-tokenizer
GenAi Tokenizer is an interactive tokenizer playground to explore how text... |
|
Experimental |
| 34 |
Marcelleedit7272/genai-tokenizer
🧠Explore tokenization with GenAi-Tokenizer, a user-friendly tool for... |
|
Experimental |
| 35 |
kgruiz/token-counter
Rust CLI for counting or tokenizing text, files, or directories with OpenAI... |
|
Experimental |
| 36 |
fengkx/tu
A du-like CLI for counting tokens |
|
Experimental |
| 37 |
feralghost/token-counter
Free API to count tokens for GPT-4, Claude, Gemini, and more. No API key... |
|
Experimental |
| 38 |
w95/tiktoken
The Tiktoken API is a tool that enables developers to calculate the token... |
|
Experimental |
| 39 |
JacobLinCool/Tiktoken-Calculator
Calculate the token count for GPT-4, GPT-3.5, GPT-3, and GPT-2. |
|
Experimental |
| 40 |
XucroYuri/Tokenlink
A Token-Based Semantic Association Mining Tool |
|
Experimental |
| 41 |
Akibkhan/LLM-Tokenizer
The Llm-Tokenizer project is a lightweight, efficient tokenizer designed for... |
|
Experimental |
| 42 |
dakofler/simple_tokenizers
Tokenizers is a collection of tokenization implementations focused on... |
|
Experimental |
| 43 |
teasec4/gpt_tokenizer
GPT Tokenizer |
|
Experimental |
| 44 |
hardesttype/switch-tokenizer
A multilingual tokenization approach that maps different language tokenizers... |
|
Experimental |
| 45 |
WinPooh32/tokc
Token counting utility |
|
Experimental |