Semantic Chunking Embedding Tools
Tools for splitting text/code into semantically coherent chunks using embeddings, AST analysis, or similarity metrics for LLM processing. Does NOT include general tokenization, sentence splitting, or document parsing without semantic awareness.
There are 15 semantic chunking tools tracked. 1 score above 50 (established tier). The highest-rated is jparkerweb/semantic-chunking at 61/100 with 134 stars and 5,194 monthly downloads.
Get all 15 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=semantic-chunking&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
jparkerweb/semantic-chunking
🍱 semantic-chunking ⇢ semantically create chunks from large document for... |
|
Established |
| 2 |
drittich/SemanticSlicer
🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents. |
|
Emerging |
| 3 |
ndgigliotti/afterthoughts
Sentence-aware embeddings using late chunking with transformers. |
|
Emerging |
| 4 |
smart-models/Normalized-Semantic-Chunker
Cutting-edge tool that unlocks the full potential of semantic chunking |
|
Emerging |
| 5 |
cspnms/MSchunker
Smart text chunker for LLM preprocessing (sections → paragraphs → sentences... |
|
Experimental |
| 6 |
ReemHal/Semantic-Text-Segmentation-with-Embeddings
Uses GloVe embeddings and greedy sequence segmentation to semantically... |
|
Experimental |
| 7 |
chu2bard/chunkflow
Document chunking pipeline for RAG applications |
|
Experimental |
| 8 |
zoobz-io/chisel
AST-aware code chunking for semantic search and embeddings |
|
Experimental |
| 9 |
danielefrisanco/semantic_chunker
A lightweight Ruby library for splitting text into topically coherent chunks... |
|
Experimental |
| 10 |
njyeung/go-semantic-chunking
Sementic chunking algorithm in (mostly) Go |
|
Experimental |
| 11 |
agamm/semantic-split
A Python library to chunk/group your texts based on semantic similarity. |
|
Experimental |
| 12 |
IanSousa04/treechunk
treechunk é uma biblioteca TypeScript para segmentação semântica de código... |
|
Experimental |
| 13 |
SainathPattipati/advanced-chunking-strategies
Semantic, agentic, and contextual chunking strategies for RAG with... |
|
Experimental |
| 14 |
geleto/semachunk
Lightweight Semantic Chunking Library. Plug any embedding provider/API.... |
|
Experimental |
| 15 |
do-me/js-text-chunker
A simple vanilla JS text chunker for hierarchical semantic chunking |
|
Experimental |