MrTechyWorker/chartokenizer
Chartokenizer is a Python package for basic character-level tokenization. It provides functionality to generate a character-to-index mapping for tokenizing strings at the character level. This can be useful in various natural language processing (NLP) tasks where text data needs to be preprocessed for analysis or modeling. 🚀
No commits in the last 6 months. Available on PyPI.
Stars
1
Forks
—
Language
Python
License
Apache-2.0
Category
Last pushed
Jun 11, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/MrTechyWorker/chartokenizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
lenML/tokenizers
a lightweight no-dependency fork from transformers.js (only tokenizers)
aiqinxuancai/TiktokenSharp
Token calculation for OpenAI models, using `o200k_base` `cl100k_base` `p50k_base` encoding.
dqbd/tiktokenizer
Online playground for OpenAPI tokenizers
pkoukk/tiktoken-go
go version of tiktoken
tryAGI/Tiktoken
This project implements token calculation for OpenAI's gpt-4 and gpt-3.5-turbo model,...