openscilab/tocount

ToCount: Lightweight Token Estimator

42
/ 100
Emerging

Provides multiple estimation strategies including rule-based approaches and linear regression models trained on tiktoken encodings (R50K, CL100K, O200K) plus emerging model tokenizers like Deepseek R1 and Llama 3.1. Uses a unified `TextEstimator` interface with pre-trained models benchmarked against real-world chat datasets, offering accuracy trade-offs from R² 0.62–0.97 depending on model selection and language specificity. Targets token budgeting for LLM applications with language-specific variants optimized for English text.

Available on PyPI.

No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 18 / 25
Community 4 / 25

How are scores calculated?

Stars

21

Forks

1

Language

Python

License

MIT

Last pushed

Feb 14, 2026

Monthly downloads

47

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/openscilab/tocount"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.