chonkie and chunking-strategies

A production-ready chunking library and a research overview repository are **complements**: the latter informs the design decisions and benchmarking choices for the former, while practitioners using the former might consult the latter to understand the algorithmic tradeoffs underlying their chunking strategy.

chonkie

Verified

chunking-strategies

Emerging

Maintenance 25/25

Adoption 15/25

Maturity 25/25

Community 18/25

Maintenance 0/25

Adoption 9/25

Maturity 8/25

Community 19/25

Stars: 3,829

Forks: 256

Downloads: —

Commits (30d): 53

Language: Python

License: MIT

Stars: 85

Forks: 18

Downloads: —

Commits (30d): 0

Language: Jupyter Notebook

License: —

No risk flags

No License Stale 6m No Package No Dependents

About chonkie

chonkie-inc/chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines

Provides pluggable chunking strategies—recursive, semantic, code-aware, and LLM-based—with composable pipeline workflows that chain multiple chunkers and refineries together. Integrates with 32+ tools across tokenizers (GPT-2, BPE), embeddings (OpenAI, Sentence Transformers), vector databases, and LLMs, while supporting 56 languages out-of-the-box through modular dependency installation.

About chunking-strategies

ALucek/chunking-strategies

An Overview of the Latest Document Chunking Research

Implements multiple chunking strategies—including character/token-based, recursive, semantic, cluster semantic, and LLM-based approaches—to optimize text splitting for RAG pipelines and vector database ingestion. Based on ChromaDB research comparing chunking methods, it provides empirical evaluation of how different segmentation strategies impact downstream retrieval performance. Integrates with vector databases and embedding models to test end-to-end RAG workflows with various chunking configurations.

Related comparisons

chonkie and chunklet-py chonkie and jchunk chonkie and chonkiejs chonkie and chonkify chonkie and rag-chunk chonkie and chunky

Scores updated daily from GitHub, PyPI, and npm data. How scores work