ThanhHung2112/Semantic_chunking
Semantic Chunking is a Python library for segmenting text into meaningful chunks using embeddings from Sentence Transformers.
The library computes sentence embeddings using Sentence Transformers models, then groups consecutive sentences by cosine similarity to detect semantic boundaries—splitting only when similarity drops below a configurable threshold. It provides tunable parameters for maximum chunk size and similarity thresholds, enabling fine-grained control over segmentation granularity for downstream NLP tasks like retrieval-augmented generation or document analysis.
Used by 1 other package. No commits in the last 6 months. Available on PyPI.
Stars
7
Forks
—
Language
Python
License
—
Category
Last pushed
Dec 15, 2024
Monthly downloads
186
Commits (30d)
0
Dependencies
2
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ThanhHung2112/Semantic_chunking"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.