andreshere00/Splitter_MR

Chunk your data into markdown text blocks for your LLM applications

47
/ 100
Emerging

Supports multiple document readers (Vanilla, MarkItDown, Docling) handling diverse formats from PDFs and Office files to images and structured data, with built-in markdown conversion. Provides 12+ splitting strategies—character, word, sentence, token, semantic, JSON, HTML tag, and code-aware—enabling fine-grained control over chunk boundaries and overlap. Integrates with vision-language models (OpenAI, Claude, Gemini, HuggingFace) and embedding providers (OpenAI, Azure, Voyage) for multimodal processing and semantic-aware chunking.

Available on PyPI.

Maintenance 10 / 25
Adoption 12 / 25
Maturity 18 / 25
Community 7 / 25

How are scores calculated?

Stars

25

Forks

2

Language

Jupyter Notebook

License

MIT

Last pushed

Jan 08, 2026

Monthly downloads

121

Commits (30d)

0

Dependencies

18

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/andreshere00/Splitter_MR"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.