ALucek/chunking-strategies

An Overview of the Latest Document Chunking Research

36
/ 100
Emerging

Implements multiple chunking strategies—including character/token-based, recursive, semantic, cluster semantic, and LLM-based approaches—to optimize text splitting for RAG pipelines and vector database ingestion. Based on ChromaDB research comparing chunking methods, it provides empirical evaluation of how different segmentation strategies impact downstream retrieval performance. Integrates with vector databases and embedding models to test end-to-end RAG workflows with various chunking configurations.

No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 19 / 25

How are scores calculated?

Stars

85

Forks

18

Language

Jupyter Notebook

License

Last pushed

Nov 25, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/ALucek/chunking-strategies"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.