rekram1-node/tokenizer
Natural Language Processing (NLP) Tokenization Libary designed for English. Fast, Lean, Customizable. Tokenizes text, replaces abbreviations, replaces contractions, lowercases words, optionally you can remove stop words as well
No commits in the last 6 months.
Stars
3
Forks
—
Language
Go
License
MIT
Category
Last pushed
Feb 07, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/rekram1-node/tokenizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
SauravP97/hf-tokenizer-visualizer
Visualize HuggingFace Byte-Pair Encoding (BPE) Tokenizer encoding process
DePasqualeOrg/swift-tiktoken
A pure Swift implementation of OpenAI's tiktoken tokenizer
Usama3627/tokenizer
Implementation of BPE Tokenizer in Rust
andikaseptiadi/local-code-model
🛠️ Build a pure Go GPT-style transformer from scratch to grasp the fundamentals of large...
twinnydotdev/toxe
SentencePiece tokenizer for cross-encoders