ThanhHung2112/Semantic_chunking

Semantic Chunking is a Python library for segmenting text into meaningful chunks using embeddings from Sentence Transformers.

/ 100

Experimental

The library computes sentence embeddings using Sentence Transformers models, then groups consecutive sentences by cosine similarity to detect semantic boundaries—splitting only when similarity drops below a configurable threshold. It provides tunable parameters for maximum chunk size and similarity thresholds, enabling fine-grained control over segmentation granularity for downstream NLP tasks like retrieval-augmented generation or document analysis.

Used by 1 other package. No commits in the last 6 months. Available on PyPI.

No License Stale 6m

Maintenance 0 / 25

Adoption 10 / 25

Maturity 10 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

danny-avila/rag_api

ID-based RAG FastAPI: Integration with Langchain and PostgreSQL/pgvector

mburaksayici/smallevals

smallevals — CPU-fast, GPU-blazing fast offline retrieval evaluation for RAG systems with tiny QA models.

naxoc/riffrag

A local RAG builder for code with a Claude Code skills creator

GoparapukethaN/rag-forge

Modular RAG framework with hybrid retrieval, intelligent chunking, and multi-provider LLM support

kxgst228/rag-forge

Modular RAG framework with hybrid retrieval, intelligent chunking, and multi-provider LLM support

Explore Vector Databases

All categories Trending Vector Database directory Insights