agamm/semantic-split

A Python library to chunk/group your texts based on semantic similarity.

28
/ 100
Experimental

Leverages SentenceTransformers for semantic embeddings and spaCy for sentence tokenization to group semantically related sentences while preserving document structure. Designed specifically for RAG pipelines and vector database ingestion, enabling efficient retrieval of contextually relevant chunks for LLM prompts while reducing token costs. Supports pluggable similarity models and sentence splitters, with examples demonstrating integration into question-answering workflows over documents.

103 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 11 / 25

How are scores calculated?

Stars

103

Forks

9

Language

Python

License

Last pushed

Jul 11, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/agamm/semantic-split"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.