chonkie and chunklet-py

These are competitors—both are chunking libraries designed to split documents into semantically meaningful pieces for RAG pipelines, with Chonkie offering more mature, production-tested functionality while Chunklet-py provides a simpler, multi-format alternative.

chonkie

Verified

chunklet-py

Established

Maintenance 25/25

Adoption 15/25

Maturity 25/25

Community 18/25

Maintenance 13/25

Adoption 9/25

Maturity 24/25

Community 5/25

Stars: 3,829

Forks: 256

Downloads: —

Commits (30d): 53

Language: Python

License: MIT

Stars: 64

Forks: 2

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

No risk flags

About chonkie

chonkie-inc/chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines

Provides pluggable chunking strategies—recursive, semantic, code-aware, and LLM-based—with composable pipeline workflows that chain multiple chunkers and refineries together. Integrates with 32+ tools across tokenizers (GPT-2, BPE), embeddings (OpenAI, Sentence Transformers), vector databases, and LLMs, while supporting 56 languages out-of-the-box through modular dependency installation.

About chunklet-py

speedyk-005/chunklet-py

One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.

Supports 50+ languages with automatic detection and offers composable constraints (sentences, tokens, sections, lines, functions) through a pluggable architecture with custom tokenizers and processors. Rich metadata annotations include source references, spans, and structural information—including AST details for code—making it well-suited for RAG and LLM applications. Handles diverse formats (PDF, DOCX, EPUB, Markdown, HTML, LaTeX, CSV, Excel) via optional document processing modules, with CLI, library, and web-based visualization interfaces.

Related comparisons

chonkie and jchunk chonkie and chonkiejs chonkie and chonkify chonkie and rag-chunk chonkie and chunky chonkie and SmartChunk

Scores updated daily from GitHub, PyPI, and npm data. How scores work