Tokenization Libraries LLM Tools

Libraries and tools for tokenizing text using OpenAI's tiktoken encoding across multiple programming languages and platforms. Does NOT include general text processing, language models themselves, or token estimation approximations without full tokenization.

There are 45 tokenization libraries tools tracked. The highest-rated is lenML/tokenizers at 49/100 with 32 stars and 92,127 monthly downloads.

Get all 45 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=tokenization-libraries&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 lenML/tokenizers

a lightweight no-dependency fork from transformers.js (only tokenizers)

49
Emerging
2 aiqinxuancai/TiktokenSharp

Token calculation for OpenAI models, using `o200k_base` `cl100k_base`...

45
Emerging
3 pkoukk/tiktoken-go

go version of tiktoken

41
Emerging
4 tryAGI/Tiktoken

This project implements token calculation for OpenAI's gpt-4 and...

41
Emerging
5 dqbd/tiktokenizer

Online playground for OpenAPI tokenizers

41
Emerging
6 microsoft/Tokenizer

Typescript and .NET implementation of BPE tokenizer for OpenAI LLMs.

40
Emerging
7 samber/tiktoken-cli

🧮 CLI for counting tokens in files and directories using tiktoken

39
Emerging
8 aallam/ktoken

Kotlin multiplatform BPE tokenizer library for OpenAI models

37
Emerging
9 MichaelCurrin/token-translator

Convert the token input limits of LLMs like ChatGPT into real-world measures...

37
Emerging
10 geckse/n8n-nodes-gpt-tokenizer

n8n node for working with BPE Tokens with GPT in mind.

35
Emerging
11 dmitry-brazhenko/SharpToken

SharpToken is a C# library for tokenizing natural language text. It's based...

34
Emerging
12 AI21Labs/ai21-tokenizer

AI21's Jamba models tokenizers

33
Emerging
13 botisan-ai/gpt3-tokenizer

Isomorphic JavaScript/TypeScript Tokenizer for GPT-3 and Codex Models by OpenAI.

32
Emerging
14 Thibault00/runtoken

A blazing-fast BPE tokenizer for LLMs. Drop-in tiktoken replacement, 20-80x faster.

29
Experimental
15 zkry/tiktoken.el

tiktoken.el is an Emacs Lisp port of OpenAI's tiktoken library for BPE tokenization

29
Experimental
16 coder/ai-tokenizer

A faster than tiktoken tokenizer with first-class support for Vercel's AI SDK.

28
Experimental
17 unitythemaker/tokdu

tokdu (Token Disk Usage) is a terminal-based utility that helps you analyze...

27
Experimental
18 oelmekki/tiktoken-cli

Simple wrapper around tiktoken to use it in your favorite language.

27
Experimental
19 Darkatse/MikTik

A multi-model tokenizer, in Rust.

26
Experimental
20 qbit-ai/tokenx-rs

Rust port of johannschopplich/tokenx - Fast token count estimation for LLMs...

25
Experimental
21 CTCycle/TKBEN-tokenizers-benchmarker

Explore and benchmark public and custom tokenizers from HuggingFace using...

24
Experimental
22 AndresEspin1993/b2t-tokenizer

B2T - Tokenizer for the AI Systems.

24
Experimental
23 Dev-in-a-Box-Limited/TokenEvaluator.Net

TokenEvaluator.Net is a simple and useful library designed to measure and...

23
Experimental
24 rodneylab/tokenator

Count the number of tokens in an LLM prompt

22
Experimental
25 gemologic/carat

a quick cli tool to estimate token count

22
Experimental
26 kgruiz/PyTokenCounter

A simple Python library for tokenizing text and counting tokens. While...

22
Experimental
27 claylo/ah-ah-ah

VUN token! TWO tokens! Count all the beautiful tokens ... offline! Ah-ah-ah!

22
Experimental
28 peterheb/gotoken

Gotoken is a pure-Go implementation of the Python library openai/tiktoken.

22
Experimental
29 valmat/gpt-tokenator

GPT 3 tokens counter

21
Experimental
30 ziliwang/gpt_tokenizer

cpp roberta tokenzier for deploy using

21
Experimental
31 MrTechyWorker/chartokenizer

Chartokenizer is a Python package for basic character-level tokenization. It...

19
Experimental
32 agentstation/tokenizer

High-performance tokenizer implementations in Go with unified CLI. Features...

19
Experimental
33 n4ryn/genai-tokenizer

GenAi Tokenizer is an interactive tokenizer playground to explore how text...

17
Experimental
34 Marcelleedit7272/genai-tokenizer

🧠 Explore tokenization with GenAi-Tokenizer, a user-friendly tool for...

16
Experimental
35 kgruiz/token-counter

Rust CLI for counting or tokenizing text, files, or directories with OpenAI...

15
Experimental
36 fengkx/tu

A du-like CLI for counting tokens

14
Experimental
37 feralghost/token-counter

Free API to count tokens for GPT-4, Claude, Gemini, and more. No API key...

14
Experimental
38 w95/tiktoken

The Tiktoken API is a tool that enables developers to calculate the token...

13
Experimental
39 JacobLinCool/Tiktoken-Calculator

Calculate the token count for GPT-4, GPT-3.5, GPT-3, and GPT-2.

13
Experimental
40 XucroYuri/Tokenlink

A Token-Based Semantic Association Mining Tool

12
Experimental
41 Akibkhan/LLM-Tokenizer

The Llm-Tokenizer project is a lightweight, efficient tokenizer designed for...

11
Experimental
42 dakofler/simple_tokenizers

Tokenizers is a collection of tokenization implementations focused on...

11
Experimental
43 teasec4/gpt_tokenizer

GPT Tokenizer

11
Experimental
44 hardesttype/switch-tokenizer

A multilingual tokenization approach that maps different language tokenizers...

11
Experimental
45 WinPooh32/tokc

Token counting utility

11
Experimental