chonkie-inc/chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines

/ 100

Verified

Provides pluggable chunking strategies—recursive, semantic, code-aware, and LLM-based—with composable pipeline workflows that chain multiple chunkers and refineries together. Integrates with 32+ tools across tokenizers (GPT-2, BPE), embeddings (OpenAI, Sentence Transformers), vector databases, and LLMs, while supporting 56 languages out-of-the-box through modular dependency installation.

3,829 stars. Used by 15 other packages. Actively maintained with 53 commits in the last 30 days. Available on PyPI.

Maintenance 25 / 25

Adoption 15 / 25

Maturity 25 / 25

Community 18 / 25

How are scores calculated?

Stars

3,829

Forks

256

Language

Python

License

MIT

Compare

chonkie and chunklet-py chonkie and jchunk chonkie and chonkiejs chonkie and chonkify chonkie and rag-chunk chonkie and chunky chonkie and SmartChunk chonkie and chunking-strategies chonkie and axonode-chunker chonkie and Sentences-Chunker

Related tools

speedyk-005/chunklet-py

One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs,...

andreshere00/Splitter_MR

Chunk your data into markdown text blocks for your LLM applications

chonkie-inc/chonkiejs

🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library

jchunk-io/jchunk

JChunk is a lightweight and flexible library designed to provide multiple strategies for text...

messkan/rag-chunk

A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.

Explore RAG Tools

All categories Trending RAG directory Insights