messkan/rag-chunk

A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.

/ 100

Emerging

Implements six chunking strategies including header-aware and embedding-based semantic splitting, with token-accurate chunking via tiktoken for specific LLM models (GPT-3.5, GPT-4, etc.). Evaluates chunk quality through precision, recall, and F1-score metrics, and supports embedding-based semantic retrieval using sentence-transformers as an alternative to lexical matching. Exports results to JSON/CSV and integrates optional LangChain components for recursive character splitting.

104 stars.

No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 13 / 25

Community 7 / 25

How are scores calculated?

Stars

104

Forks

Language

Python

License

MIT

Compare

rag-chunk and chonkie rag-chunk and adaptive-chunking

Higher-rated alternatives

chonkie-inc/chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust...

speedyk-005/chunklet-py

One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs,...

andreshere00/Splitter_MR

Chunk your data into markdown text blocks for your LLM applications

chonkie-inc/chonkiejs

🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library

jchunk-io/jchunk

JChunk is a lightweight and flexible library designed to provide multiple strategies for text...

Explore RAG Tools

All categories Trending RAG directory Insights