chonkie and rag-chunk

These are complements: Chonkie is a production-ready ingestion library for RAG pipelines, while rag-chunk is a benchmarking CLI tool for evaluating and selecting optimal chunking strategies before deployment.

chonkie

Verified

rag-chunk

Emerging

Maintenance 25/25

Adoption 15/25

Maturity 25/25

Community 18/25

Maintenance 10/25

Adoption 9/25

Maturity 13/25

Community 7/25

Stars: 3,829

Forks: 256

Downloads: —

Commits (30d): 53

Language: Python

License: MIT

Stars: 104

Forks: 5

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

No risk flags

No Package No Dependents

About chonkie

chonkie-inc/chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines

Provides pluggable chunking strategies—recursive, semantic, code-aware, and LLM-based—with composable pipeline workflows that chain multiple chunkers and refineries together. Integrates with 32+ tools across tokenizers (GPT-2, BPE), embeddings (OpenAI, Sentence Transformers), vector databases, and LLMs, while supporting 56 languages out-of-the-box through modular dependency installation.

About rag-chunk

messkan/rag-chunk

A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.

Implements six chunking strategies including header-aware and embedding-based semantic splitting, with token-accurate chunking via tiktoken for specific LLM models (GPT-3.5, GPT-4, etc.). Evaluates chunk quality through precision, recall, and F1-score metrics, and supports embedding-based semantic retrieval using sentence-transformers as an alternative to lexical matching. Exports results to JSON/CSV and integrates optional LangChain components for recursive character splitting.

Related comparisons

chonkie and chunklet-py chonkie and jchunk chonkie and chonkiejs chonkie and chonkify chonkie and chunky chonkie and SmartChunk

Scores updated daily from GitHub, PyPI, and npm data. How scores work