ThanhHung2112/Semantic_chunking
Semantic Chunking is a Python library for segmenting text into meaningful chunks using embeddings from Sentence Transformers.
The library computes sentence embeddings using Sentence Transformers models, then groups consecutive sentences by cosine similarity to detect semantic boundaries—splitting only when similarity drops below a configurable threshold. It provides tunable parameters for maximum chunk size and similarity thresholds, enabling fine-grained control over segmentation granularity for downstream NLP tasks like retrieval-augmented generation or document analysis.
Used by 1 other package. No commits in the last 6 months. Available on PyPI.
Stars
7
Forks
—
Language
Python
License
—
Category
Last pushed
Dec 15, 2024
Monthly downloads
186
Commits (30d)
0
Dependencies
2
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/ThanhHung2112/Semantic_chunking"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
danny-avila/rag_api
ID-based RAG FastAPI: Integration with Langchain and PostgreSQL/pgvector
mburaksayici/smallevals
smallevals — CPU-fast, GPU-blazing fast offline retrieval evaluation for RAG systems with tiny QA models.
naxoc/riffrag
A local RAG builder for code with a Claude Code skills creator
GoparapukethaN/rag-forge
Modular RAG framework with hybrid retrieval, intelligent chunking, and multi-provider LLM support
kxgst228/rag-forge
Modular RAG framework with hybrid retrieval, intelligent chunking, and multi-provider LLM support