chonkie-inc/chonkie
🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines
Provides pluggable chunking strategies—recursive, semantic, code-aware, and LLM-based—with composable pipeline workflows that chain multiple chunkers and refineries together. Integrates with 32+ tools across tokenizers (GPT-2, BPE), embeddings (OpenAI, Sentence Transformers), vector databases, and LLMs, while supporting 56 languages out-of-the-box through modular dependency installation.
3,829 stars. Used by 15 other packages. Actively maintained with 53 commits in the last 30 days. Available on PyPI.
Stars
3,829
Forks
256
Language
Python
License
MIT
Category
Last pushed
Mar 12, 2026
Commits (30d)
53
Dependencies
4
Reverse dependents
15
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/chonkie-inc/chonkie"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related tools
speedyk-005/chunklet-py
One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs,...
andreshere00/Splitter_MR
Chunk your data into markdown text blocks for your LLM applications
chonkie-inc/chonkiejs
🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library
jchunk-io/jchunk
JChunk is a lightweight and flexible library designed to provide multiple strategies for text...
messkan/rag-chunk
A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.