andreshere00/Splitter_MR
Chunk your data into markdown text blocks for your LLM applications
Supports multiple document readers (Vanilla, MarkItDown, Docling) handling diverse formats from PDFs and Office files to images and structured data, with built-in markdown conversion. Provides 12+ splitting strategies—character, word, sentence, token, semantic, JSON, HTML tag, and code-aware—enabling fine-grained control over chunk boundaries and overlap. Integrates with vision-language models (OpenAI, Claude, Gemini, HuggingFace) and embedding providers (OpenAI, Azure, Voyage) for multimodal processing and semantic-aware chunking.
Available on PyPI.
Stars
25
Forks
2
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jan 08, 2026
Monthly downloads
121
Commits (30d)
0
Dependencies
18
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/andreshere00/Splitter_MR"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
chonkie-inc/chonkie
🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust...
speedyk-005/chunklet-py
One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs,...
chonkie-inc/chonkiejs
🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library
jchunk-io/jchunk
JChunk is a lightweight and flexible library designed to provide multiple strategies for text...
messkan/rag-chunk
A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.