mirth/chonky
Fully neural approach for text chunking
50
/ 100
Established
Uses fine-tuned transformer models (ModernBERT, mBERT) that learn semantic boundaries directly from training data, outperforming rule-based and embedding similarity approaches on standard benchmarks. Integrates with RAG pipelines and supports markup removal across HTML, XML, and Markdown formats; multiple model variants range from 66M to 396M parameters with multilingual options available on Hugging Face.
407 stars and 312 monthly downloads. Available on PyPI.
Maintenance
6 / 25
Adoption
16 / 25
Maturity
18 / 25
Community
10 / 25
Stars
407
Forks
16
Language
Python
License
MIT
Category
Last pushed
Oct 23, 2025
Monthly downloads
312
Commits (30d)
0
Dependencies
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/mirth/chonky"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.