Document Chunking RAG Tools
Tools for splitting, segmenting, and optimizing documents into chunks for RAG pipelines. Includes chunking strategies (fixed, semantic, adaptive), chunk visualization/validation, and parameter optimization. Does NOT include document parsing, extraction, embedding, or retrieval components.
There are 38 document chunking tools tracked. 1 score above 70 (verified tier). The highest-rated is chonkie-inc/chonkie at 83/100 with 3,829 stars. 1 of the top 10 are actively maintained.
Get all 38 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=rag&subcategory=document-chunking&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
chonkie-inc/chonkie
🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast,... |
|
Verified |
| 2 |
andreshere00/Splitter_MR
Chunk your data into markdown text blocks for your LLM applications |
|
Emerging |
| 3 |
speedyk-005/chunklet-py
One library to split them all: Sentence, Code, Docs. Chunk smarter, not... |
|
Emerging |
| 4 |
jchunk-io/jchunk
JChunk is a lightweight and flexible library designed to provide multiple... |
|
Emerging |
| 5 |
chonkie-inc/chonkiejs
🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and... |
|
Emerging |
| 6 |
thom-heinrich/chonkify
Extractive document compression for RAG and agent pipelines. +69% vs... |
|
Emerging |
| 7 |
chonkie-inc/mtcb
🤔 wondering if your chunks are good? 🦉 Judie is here to Judge and Evaluate... |
|
Emerging |
| 8 |
messkan/rag-chunk
A Python CLI to test, benchmark, and find the best RAG chunking strategy for... |
|
Emerging |
| 9 |
GiovanniPasq/chunky
Validate, visualize, edit, and export chunks for RAG pipelines. |
|
Emerging |
| 10 |
ayush585/SmartChunk
SmartChunk is a lightweight, structure-aware semantic chunking toolkit... |
|
Emerging |
| 11 |
ALucek/chunking-strategies
An Overview of the Latest Document Chunking Research |
|
Experimental |
| 12 |
jackfsuia/bert-chunker
bert-chunker: efficient and trained chunking for unstructured documents. ... |
|
Experimental |
| 13 |
bazilicum/axonode-chunker
Advanced semantic text chunking with custom structural markers, whole-text... |
|
Experimental |
| 14 |
smart-models/Sentences-Chunker
Cutting-edge tool designed to intelligently segment text documents into... |
|
Experimental |
| 15 |
AceAtDev/RAG-chunker
The easiest and most effective way tool to retrain a RAG LLM/GEN AI/Agent on... |
|
Experimental |
| 16 |
ekimetrics/adaptive-chunking
Adaptive Chunking: automatically select the best chunking method per... |
|
Experimental |
| 17 |
asukhodko/dify-markdown-chunker
Advanced Markdown text chunker tool plugin for Dify RAG / knowledge bases |
|
Experimental |
| 18 |
mirpo/chopdoc
A tool to split documents into chunks for RAG and LLM applications |
|
Experimental |
| 19 |
wevote-project/crystal-text-splitter
Intelligent text chunking for RAG (Retrieval-Augmented Generation) and LLM... |
|
Experimental |
| 20 |
yuma-shintani/chunksize-checker
Calculate the number of total tokens, optimal chunk size and chunk overlap... |
|
Experimental |
| 21 |
arclabs561/slabs
Text chunking for RAG: fixed, sentence, recursive, and semantic splitting |
|
Experimental |
| 22 |
stranger00135/ragflow-optimizer
Automatically discover the best RAGFlow chunking parameters for each... |
|
Experimental |
| 23 |
MukundaKatta/ChunkWise
ChunkWise — Intelligent Document Chunking. Smart document chunking for RAG pipelines |
|
Experimental |
| 24 |
philip-zhan/semchunk.rb
Ruby port of https://github.com/isaacus-dev/semchunk |
|
Experimental |
| 25 |
cwccie/ragchunk
Chunking library for technical documentation — domain-aware splitting for... |
|
Experimental |
| 26 |
bgokden/fast-text-splitter
fast text splitter with onnx |
|
Experimental |
| 27 |
pranavms13/deepcontext
A semantic chunking service for documents, GitHub repos, webpages, and... |
|
Experimental |
| 28 |
zenwor/icm_rag
🧩 Intelligent Chunking Methods for Code Documentation RAG |
|
Experimental |
| 29 |
fujiba/pdf-chunker
LLM-friendly PDF splitter & image optimizer. Chunk PDFs by size and... |
|
Experimental |
| 30 |
tainmou/SmartChunk
🧩 Enhance RAG processes with SmartChunk, a Python package that creates... |
|
Experimental |
| 31 |
sanbaiw/semtxtsplitter
A smol Go package for splitting text into chunks while preserving semantic meaning. |
|
Experimental |
| 32 |
AleGallagher/ChunkingTechniques
🚀 Comprehensive evaluation of chunking techniques for RAG pipelines. Compare... |
|
Experimental |
| 33 |
Arnav-Ajay/rag-chunking-strategies
A controlled study showing how different chunking strategies change which... |
|
Experimental |
| 34 |
hemantjuyal/Latent-Chunk-Lab
A hands-on playground to explore different chunking techniques for... |
|
Experimental |
| 35 |
Leo310/rag-chunking-evaluation
Assess the effectiveness of chunking strategies in RAG systems via a custom... |
|
Experimental |
| 36 |
OneOffTech/the-chunk-list
A comprehensive open-source database of document parsers, their pricing, and... |
|
Experimental |
| 37 |
DTufail/rag-chunk-eval
Benchmarking harness for RAG chunking strategies — compares Fixed,... |
|
Experimental |
| 38 |
Devparihar5/chunking-strategies-comparison
A deep dive into text chunking for Retrieval-Augmented Generation systems |
|
Experimental |