Semantic Chunking Embedding Tools

Tools for splitting text/code into semantically coherent chunks using embeddings, AST analysis, or similarity metrics for LLM processing. Does NOT include general tokenization, sentence splitting, or document parsing without semantic awareness.

There are 15 semantic chunking tools tracked. 1 score above 50 (established tier). The highest-rated is jparkerweb/semantic-chunking at 61/100 with 134 stars and 5,194 monthly downloads.

Get all 15 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=semantic-chunking&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 jparkerweb/semantic-chunking

🍱 semantic-chunking ⇢ semantically create chunks from large document for...

61
Established
2 drittich/SemanticSlicer

🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents.

38
Emerging
3 ndgigliotti/afterthoughts

Sentence-aware embeddings using late chunking with transformers.

34
Emerging
4 smart-models/Normalized-Semantic-Chunker

Cutting-edge tool that unlocks the full potential of semantic chunking

32
Emerging
5 cspnms/MSchunker

Smart text chunker for LLM preprocessing (sections → paragraphs → sentences...

29
Experimental
6 ReemHal/Semantic-Text-Segmentation-with-Embeddings

Uses GloVe embeddings and greedy sequence segmentation to semantically...

26
Experimental
7 chu2bard/chunkflow

Document chunking pipeline for RAG applications

24
Experimental
8 zoobz-io/chisel

AST-aware code chunking for semantic search and embeddings

24
Experimental
9 danielefrisanco/semantic_chunker

A lightweight Ruby library for splitting text into topically coherent chunks...

22
Experimental
10 njyeung/go-semantic-chunking

Sementic chunking algorithm in (mostly) Go

21
Experimental
11 agamm/semantic-split

A Python library to chunk/group your texts based on semantic similarity.

21
Experimental
12 IanSousa04/treechunk

treechunk é uma biblioteca TypeScript para segmentação semântica de código...

20
Experimental
13 SainathPattipati/advanced-chunking-strategies

Semantic, agentic, and contextual chunking strategies for RAG with...

19
Experimental
14 geleto/semachunk

Lightweight Semantic Chunking Library. Plug any embedding provider/API....

15
Experimental
15 do-me/js-text-chunker

A simple vanilla JS text chunker for hierarchical semantic chunking

10
Experimental