speedyk-005/chunklet-py
One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.
Supports 50+ languages with automatic detection and offers composable constraints (sentences, tokens, sections, lines, functions) through a pluggable architecture with custom tokenizers and processors. Rich metadata annotations include source references, spans, and structural information—including AST details for code—making it well-suited for RAG and LLM applications. Handles diverse formats (PDF, DOCX, EPUB, Markdown, HTML, LaTeX, CSV, Excel) via optional document processing modules, with CLI, library, and web-based visualization interfaces.
Used by 1 other package. Available on PyPI.
Stars
64
Forks
2
Language
Python
License
MIT
Category
Last pushed
Mar 13, 2026
Commits (30d)
0
Dependencies
12
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/speedyk-005/chunklet-py"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related tools
chonkie-inc/chonkie
🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust...
andreshere00/Splitter_MR
Chunk your data into markdown text blocks for your LLM applications
chonkie-inc/chonkiejs
🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library
jchunk-io/jchunk
JChunk is a lightweight and flexible library designed to provide multiple strategies for text...
messkan/rag-chunk
A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.