speedyk-005/chunklet-py

One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.

/ 100

Established

Supports 50+ languages with automatic detection and offers composable constraints (sentences, tokens, sections, lines, functions) through a pluggable architecture with custom tokenizers and processors. Rich metadata annotations include source references, spans, and structural information—including AST details for code—making it well-suited for RAG and LLM applications. Handles diverse formats (PDF, DOCX, EPUB, Markdown, HTML, LaTeX, CSV, Excel) via optional document processing modules, with CLI, library, and web-based visualization interfaces.

Used by 1 other package. Available on PyPI.

Maintenance 13 / 25

Adoption 9 / 25

Maturity 24 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Compare

chunklet-py and chonkie

Related tools

chonkie-inc/chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust...

andreshere00/Splitter_MR

Chunk your data into markdown text blocks for your LLM applications

chonkie-inc/chonkiejs

🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library

jchunk-io/jchunk

JChunk is a lightweight and flexible library designed to provide multiple strategies for text...

messkan/rag-chunk

A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.

Explore RAG Tools

All categories Trending RAG directory Insights