brightjonathan/BPE-Stanford
We train and implement a byte-level byte-pair encoding (BPE) tokenizer. In particular, we represent arbitrary (Unicode) strings as a sequence of bytes and train our BPE tokenizer on this byte sequence. We use this tokenizer to encode text (a string) into tokens (a sequence of integers)
No commits in the last 6 months.
Stars
—
Forks
—
Language
Python
License
MIT
Category
Last pushed
Aug 07, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/brightjonathan/BPE-Stanford"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
SauravP97/hf-tokenizer-visualizer
Visualize HuggingFace Byte-Pair Encoding (BPE) Tokenizer encoding process
DePasqualeOrg/swift-tiktoken
A pure Swift implementation of OpenAI's tiktoken tokenizer
Usama3627/tokenizer
Implementation of BPE Tokenizer in Rust
andikaseptiadi/local-code-model
🛠️ Build a pure Go GPT-style transformer from scratch to grasp the fundamentals of large...
twinnydotdev/toxe
SentencePiece tokenizer for cross-encoders