vola-trebla/rag-ingestion-toolkit
Clean, structured data pipeline for RAG systems. Converts raw HTML, PDF, and Markdown into chunked, embedding-ready output with metadata extraction.
Stars
—
Forks
—
Language
Python
License
—
Category
Last pushed
Mar 11, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/vola-trebla/rag-ingestion-toolkit"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
kreuzberg-dev/kreuzberg-surrealdb
Extract, chunk, and embed documents from 88+ formats directly into SurrealDB.
LLM-Implementation/private-rag-embeddinggemma
🔒 100% Private RAG Stack with EmbeddingGemma, SQLite-vec & Ollama - Zero Cost, Offline Capable
sudhanshug16/chromadb-cli
CLI to interact with ChromaDB (https://github.com/chroma-core/chroma)
jmiba/zotero-redisearch-rag
An Obsidian plugin that synchronizes selected Zotero full-text items with your vault in...
sanketvagal/rag-notes
RAG system that lets you chat with your Obsidian/Markdown notes — chunks by headers, embeds with...