andreshere00/Splitter_MR

Chunk your data into markdown text blocks for your LLM applications

/ 100

Emerging

Supports multiple document readers (Vanilla, MarkItDown, Docling) handling diverse formats from PDFs and Office files to images and structured data, with built-in markdown conversion. Provides 12+ splitting strategies—character, word, sentence, token, semantic, JSON, HTML tag, and code-aware—enabling fine-grained control over chunk boundaries and overlap. Integrates with vision-language models (OpenAI, Claude, Gemini, HuggingFace) and embedding providers (OpenAI, Azure, Voyage) for multimodal processing and semantic-aware chunking.

Available on PyPI.

Maintenance 10 / 25

Adoption 12 / 25

Maturity 18 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

chonkie-inc/chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust...

speedyk-005/chunklet-py

One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs,...

chonkie-inc/chonkiejs

🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library

jchunk-io/jchunk

JChunk is a lightweight and flexible library designed to provide multiple strategies for text...

messkan/rag-chunk

A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.

Explore RAG Tools

All categories Trending RAG directory Insights