ThanhHung2112/Semantic_chunking

Semantic Chunking is a Python library for segmenting text into meaningful chunks using embeddings from Sentence Transformers.

20
/ 100
Experimental

The library computes sentence embeddings using Sentence Transformers models, then groups consecutive sentences by cosine similarity to detect semantic boundaries—splitting only when similarity drops below a configurable threshold. It provides tunable parameters for maximum chunk size and similarity thresholds, enabling fine-grained control over segmentation granularity for downstream NLP tasks like retrieval-augmented generation or document analysis.

Used by 1 other package. No commits in the last 6 months. Available on PyPI.

No License Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 10 / 25
Community 0 / 25

How are scores calculated?

Stars

7

Forks

Language

Python

License

Last pushed

Dec 15, 2024

Monthly downloads

186

Commits (30d)

0

Dependencies

2

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/ThanhHung2112/Semantic_chunking"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.