Document Chunking RAG Tools

Tools for splitting, segmenting, and optimizing documents into chunks for RAG pipelines. Includes chunking strategies (fixed, semantic, adaptive), chunk visualization/validation, and parameter optimization. Does NOT include document parsing, extraction, embedding, or retrieval components.

There are 38 document chunking tools tracked. 1 score above 70 (verified tier). The highest-rated is chonkie-inc/chonkie at 83/100 with 3,829 stars. 1 of the top 10 are actively maintained.

Get all 38 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=rag&subcategory=document-chunking&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 chonkie-inc/chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast,...

83
Verified
2 andreshere00/Splitter_MR

Chunk your data into markdown text blocks for your LLM applications

47
Emerging
3 speedyk-005/chunklet-py

One library to split them all: Sentence, Code, Docs. Chunk smarter, not...

45
Emerging
4 jchunk-io/jchunk

JChunk is a lightweight and flexible library designed to provide multiple...

40
Emerging
5 chonkie-inc/chonkiejs

🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and...

40
Emerging
6 thom-heinrich/chonkify

Extractive document compression for RAG and agent pipelines. +69% vs...

39
Emerging
7 chonkie-inc/mtcb

🤔 wondering if your chunks are good? 🦉 Judie is here to Judge and Evaluate...

35
Emerging
8 messkan/rag-chunk

A Python CLI to test, benchmark, and find the best RAG chunking strategy for...

35
Emerging
9 GiovanniPasq/chunky

Validate, visualize, edit, and export chunks for RAG pipelines.

33
Emerging
10 ayush585/SmartChunk

SmartChunk is a lightweight, structure-aware semantic chunking toolkit...

31
Emerging
11 ALucek/chunking-strategies

An Overview of the Latest Document Chunking Research

29
Experimental
12 jackfsuia/bert-chunker

bert-chunker: efficient and trained chunking for unstructured documents. ...

29
Experimental
13 bazilicum/axonode-chunker

Advanced semantic text chunking with custom structural markers, whole-text...

27
Experimental
14 smart-models/Sentences-Chunker

Cutting-edge tool designed to intelligently segment text documents into...

26
Experimental
15 AceAtDev/RAG-chunker

The easiest and most effective way tool to retrain a RAG LLM/GEN AI/Agent on...

25
Experimental
16 ekimetrics/adaptive-chunking

Adaptive Chunking: automatically select the best chunking method per...

25
Experimental
17 asukhodko/dify-markdown-chunker

Advanced Markdown text chunker tool plugin for Dify RAG / knowledge bases

24
Experimental
18 mirpo/chopdoc

A tool to split documents into chunks for RAG and LLM applications

24
Experimental
19 wevote-project/crystal-text-splitter

Intelligent text chunking for RAG (Retrieval-Augmented Generation) and LLM...

23
Experimental
20 yuma-shintani/chunksize-checker

Calculate the number of total tokens, optimal chunk size and chunk overlap...

23
Experimental
21 arclabs561/slabs

Text chunking for RAG: fixed, sentence, recursive, and semantic splitting

23
Experimental
22 stranger00135/ragflow-optimizer

Automatically discover the best RAGFlow chunking parameters for each...

22
Experimental
23 MukundaKatta/ChunkWise

ChunkWise — Intelligent Document Chunking. Smart document chunking for RAG pipelines

22
Experimental
24 philip-zhan/semchunk.rb

Ruby port of https://github.com/isaacus-dev/semchunk

21
Experimental
25 cwccie/ragchunk

Chunking library for technical documentation — domain-aware splitting for...

19
Experimental
26 bgokden/fast-text-splitter

fast text splitter with onnx

19
Experimental
27 pranavms13/deepcontext

A semantic chunking service for documents, GitHub repos, webpages, and...

17
Experimental
28 zenwor/icm_rag

🧩 Intelligent Chunking Methods for Code Documentation RAG

15
Experimental
29 fujiba/pdf-chunker

LLM-friendly PDF splitter & image optimizer. Chunk PDFs by size and...

15
Experimental
30 tainmou/SmartChunk

🧩 Enhance RAG processes with SmartChunk, a Python package that creates...

14
Experimental
31 sanbaiw/semtxtsplitter

A smol Go package for splitting text into chunks while preserving semantic meaning.

12
Experimental
32 AleGallagher/ChunkingTechniques

🚀 Comprehensive evaluation of chunking techniques for RAG pipelines. Compare...

11
Experimental
33 Arnav-Ajay/rag-chunking-strategies

A controlled study showing how different chunking strategies change which...

11
Experimental
34 hemantjuyal/Latent-Chunk-Lab

A hands-on playground to explore different chunking techniques for...

11
Experimental
35 Leo310/rag-chunking-evaluation

Assess the effectiveness of chunking strategies in RAG systems via a custom...

11
Experimental
36 OneOffTech/the-chunk-list

A comprehensive open-source database of document parsers, their pricing, and...

11
Experimental
37 DTufail/rag-chunk-eval

Benchmarking harness for RAG chunking strategies — compares Fixed,...

11
Experimental
38 Devparihar5/chunking-strategies-comparison

A deep dive into text chunking for Retrieval-Augmented Generation systems

11
Experimental