sakshamVerma08/MultiModal-RAG-Practice-
Multi-Modal RAG: Retrieval-Augmented Generation over Text and Visual PDFs A multi-modal RAG system capable of understanding and reasoning over PDFs containing both text and images. Combines LangChain, CLIP, and FAISS to extract textual content, encode visual features, and enable unified semantic retrieval for context-aware responses.
Stars
—
Forks
—
Language
—
License
MIT
Category
Last pushed
Oct 27, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/sakshamVerma08/MultiModal-RAG-Practice-"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
AnswerDotAI/byaldi
Use late-interaction multi-modal models such as ColPali in just a few lines of code.
illuin-tech/colpali
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
jolibrain/colette
Multimodal RAG to search and interact locally with technical documents of any kind
nannib/nbmultirag
Un framework in Italiano ed Inglese, che permette di chattare con i propri documenti in RAG,...
OpenBMB/VisRAG
Parsing-free RAG supported by VLMs