Multimodal RAG Systems RAG Tools
Tools and frameworks for retrieval-augmented generation systems that process and integrate multiple data modalities (images, text, video, audio, tables) together. Does NOT include single-modality RAG, domain-specific RAG applications, or general multimodal AI without retrieval components.
There are 98 multimodal rag systems tools tracked. 2 score above 50 (established tier). The highest-rated is AnswerDotAI/byaldi at 56/100 with 844 stars and 3,709 monthly downloads. 1 of the top 10 are actively maintained.
Get all 98 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=rag&subcategory=multimodal-rag-systems&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
AnswerDotAI/byaldi
Use late-interaction multi-modal models such as ColPali in just a few lines of code. |
|
Established |
| 2 |
illuin-tech/colpali
The code used to train and run inference with the ColVision models, e.g.... |
|
Established |
| 3 |
jolibrain/colette
Multimodal RAG to search and interact locally with technical documents of any kind |
|
Emerging |
| 4 |
nannib/nbmultirag
Un framework in Italiano ed Inglese, che permette di chattare con i propri... |
|
Emerging |
| 5 |
OpenBMB/VisRAG
Parsing-free RAG supported by VLMs |
|
Emerging |
| 6 |
chiang-yuan/llamp
[EMNLP '25] A web app and Python API for multi-modal RAG framework to ground... |
|
Emerging |
| 7 |
cilabuniba/artseek
ArtSeek: Deep artwork understanding via multimodal in-context reasoning and... |
|
Emerging |
| 8 |
Leon1207/Video-RAG-master
โจโจ[NeurIPS 2025] This is the official implementation of our paper... |
|
Emerging |
| 9 |
JuliaGenAI/ColBERT.jl
Efficient late-interaction retrieval systems in Julia! |
|
Emerging |
| 10 |
tonywu71/colpali-cookbooks
Recipes for learning, fine-tuning, and adapting ColPali to your multimodal... |
|
Emerging |
| 11 |
ACMarcone86/artseek
ArtSeek combines late-interaction retrieval over a 5M+ multimodal corpus... |
|
Emerging |
| 12 |
llm-lab-org/Multimodal-RAG-Survey
A Survey on Multimodal Retrieval-Augmented Generation |
|
Emerging |
| 13 |
deep-div/Multimodel-RAG
Multimodal RAG ingests PDFs and generates combined text and image outputs by... |
|
Emerging |
| 14 |
wgcyeo/UniversalRAG
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse... |
|
Emerging |
| 15 |
adithya-s-k/VARAG
Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine |
|
Emerging |
| 16 |
MohamedMostafa259/pif-multimodal-rag
A modular, multilingual, and multimodal Retrieval-Augmented Generation (RAG)... |
|
Emerging |
| 17 |
chg0901/Honor_of_Kings_Multi-modal_Dataset
A Multi-modal RAG Project with Dataset from Honor of Kings, one of the most... |
|
Experimental |
| 18 |
Ahmed-AI-01/Multimodal-RAG
An AI-powered chat application using text, audio, and images for... |
|
Experimental |
| 19 |
pranshuchaurasia/image-indexing-and-retrival-with-qdrant
The repo provides the code for Qdrant for efficient image indexing and... |
|
Experimental |
| 20 |
the-bird-F/GLM-Voice-RAG
[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end... |
|
Experimental |
| 21 |
richard-peng-xia/RULE
[EMNLP'24] RULE: Reliable Multimodal RAG for Factuality in Medical Vision... |
|
Experimental |
| 22 |
joohyung00/lilac
This is the public repository for "LILaC: Late Interacting in Layered... |
|
Experimental |
| 23 |
zhaosuifeng/FinRAGBench-V
FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the... |
|
Experimental |
| 24 |
cany7/LumiCite
LumiCite is a multimodal RAG system for academic papers, designed for... |
|
Experimental |
| 25 |
GenCEO/mm-rag-playbook
Lightweight multimodal RAG patterns for PDF-like documents |
|
Experimental |
| 26 |
ChaoLinAViy/OMGM
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient... |
|
Experimental |
| 27 |
AhmedAl93/multimodal-semantic-RAG
A RAG system designed to process documents with multimodal content. It can... |
|
Experimental |
| 28 |
Hoar012/RAP-MLLM
[CVPR 2025] RAP: Retrieval-Augmented Personalization |
|
Experimental |
| 29 |
connectpool/multimodal-rag-lab
Compact multimodal RAG baseline with chunking, BM25 retrieval and prompt assembly. |
|
Experimental |
| 30 |
dame-cell/VisionRAG
A new novel multi-modality (Vision) RAG architecture |
|
Experimental |
| 31 |
DataFog/vlm-api
REST API for computing cross-modal similarity between images and text using... |
|
Experimental |
| 32 |
RecSys-lab/RAG-VisualRec
๐ง A Resource for Multi-Modal Learning in Visual RAGs |
|
Experimental |
| 33 |
ResearchAgents/multimodal-doc-rag
A lightweight pipeline for multimodal document retrieval and QA using... |
|
Experimental |
| 34 |
medazizsaaadallah/Knowledge-Infused-Multimodal-Retrieval-A-RAG-Based-Approach-for-Context-Aware-Image-Understanding
๐ Enhance image understanding through a RAG-based approach, combining... |
|
Experimental |
| 35 |
Devanik21/xylia-vision
Vision transformer-powered knowledge extraction. Analyze any image:... |
|
Experimental |
| 36 |
alilooop/AssetRetrieval3D
๐ Retrieve 3D assets effortlessly using text or images with this multi-modal... |
|
Experimental |
| 37 |
RodneyFinkel/groq_deepgram_agent
Multi Modal Agent using Deepgram and Groq LPU's and Sentence Transformers... |
|
Experimental |
| 38 |
aimagelab/ReT-2
Recurrence Meets Transformers for Universal Multimodal Retrieval |
|
Experimental |
| 39 |
Rayen-Hamza/Klippy
A text-centric multimodal local first RAG system with knowledge graph... |
|
Experimental |
| 40 |
santiago68310/RAG-based-multimodal-agent
A sophisticated Retrieval-Augmented Generation (RAG) system that combines... |
|
Experimental |
| 41 |
SnowNation101/Nyx
Code for the paper โTowards Mixed-Modal Retrieval for Universal... |
|
Experimental |
| 42 |
Azure-Samples/multimodal_rag_python
Python notebook for solving overlapping tables problem with Azure document... |
|
Experimental |
| 43 |
DuhanJishnu/NeuraNexus
Offline Multimodal RAG System for Unified Retrieval from Text, Image, and Audio Data |
|
Experimental |
| 44 |
seth-woo/mkrs-optional-memory
Multimodal Knowledge Retrieval System with Optional Memory (MKRS) |
|
Experimental |
| 45 |
kyopark2014/llm-multimodal-and-rag
It shows how to use mutimodal and RAG based on multi-region LLM. |
|
Experimental |
| 46 |
aniketpoojari/Enterprise-AI-Assistant-MCP
Production-grade Multi-Modal RAG system for intelligent document Q&A with... |
|
Experimental |
| 47 |
SainathPattipati/multi-modal-rag
RAG over images, PDFs, tables, and structured data โ unified retrieval... |
|
Experimental |
| 48 |
nicolas-len/gcp-multimodal-ai-rag
Multimodal AI knowledge base, RAG on GCP with Gemini parsing, BigQuery... |
|
Experimental |
| 49 |
Alijanloo/MultiModalRag
A Multi-Modal Agentic RAG pipeline designed to handle unstructured documents... |
|
Experimental |
| 50 |
THE-S0HAM/OmniWhale-RAG
Generalized, Offline-First Multimodal AI System |
|
Experimental |
| 51 |
naimkatiman/Multi-Modal-RAG-Pipeline-on-Images-and-Text-Locally
My first Multi-Modal RAG pipeline....Dummy version |
|
Experimental |
| 52 |
MMDocRAG/MMDocRAG
The code used to train and run inference with MMDocRAG |
|
Experimental |
| 53 |
forfrt/vgsg_rag
Visual Grounded Story Generation with RAG |
|
Experimental |
| 54 |
starsuzi/VideoRAG
VideoRAG: Retrieval-Augmented Generation over Video Corpus |
|
Experimental |
| 55 |
RazerArdi/Knowledge-Infused-Multimodal-Retrieval-A-RAG-Based-Approach-for-Context-Aware-Image-Understanding
A modular RAG-based framework for image retrieval and context-aware... |
|
Experimental |
| 56 |
Ashutosh-AIBOT/multimodal-rag-research-assistant
Multi-source RAG assistant โ chat with PDFs, research YouTube channels,... |
|
Experimental |
| 57 |
Ghost-141/Multi-Modal-Local-RAG
A Multi-Modal RAG Pipeline with Local LLMs |
|
Experimental |
| 58 |
SungJuyeon/multimodal_RAG_System
์ด๋ฏธ์ง, ์์์ ์ ๋ก๋ํ์ฌ ์ง์์๋ตํ๋ ์์คํ |
|
Experimental |
| 59 |
TioeAre/BayesRAG
BayesRAG: Probabilistic Mutual Evidence Corroboration for Multimodal... |
|
Experimental |
| 60 |
AliHamzaAzam/multimodal-rag
Multimodal RAG system with CLIP embeddings, FAISS search, and MLX-powered Mistral LLM |
|
Experimental |
| 61 |
muthusamir/GraphMultimodalRAG
Enhancing Vision-Language Retrieval with Graph-Based and Multimodal RAG Integration |
|
Experimental |
| 62 |
Arnav000/Multimodal-RAG
This repository contains a full-stack Multimodal Retrieval-Augmented... |
|
Experimental |
| 63 |
jiangnanboy/pdf_multimodal_rag
pdf multimodal rag ใpdfๅคๆจกๆrag้ฎ็ญใ |
|
Experimental |
| 64 |
sakshamVerma08/MultiModal-RAG-Practice-
Multi-Modal RAG: Retrieval-Augmented Generation over Text and Visual PDFs A... |
|
Experimental |
| 65 |
rutvik29/multimodal-rag
Production multimodal RAG pipeline: ingests PDFs, images, and tables with... |
|
Experimental |
| 66 |
jthiruveedula/multimodal-rag-pipeline
End-to-end Multimodal RAG pipeline ingesting PDFs, images, and audio using... |
|
Experimental |
| 67 |
RitamPatra/rag-project
Multimodal RAG chatbot |
|
Experimental |
| 68 |
AnithaKarre/multimodel_RAG
Multimodal RAG pipeline that ingests PDFs, Word docs, CSVs, Excel files, and... |
|
Experimental |
| 69 |
sgxs2014/mmrag-toolkit
A minimal toolkit for Multimodal RAG โ retrieve images and text, ground... |
|
Experimental |
| 70 |
CKeibel/FHSWF-deep-learning
Multimodal RAG and comparisons between language models. (Project for Deep... |
|
Experimental |
| 71 |
id4thomas/psi-king
Framework for building Multimodal Document Retrievers |
|
Experimental |
| 72 |
jeswanthmukesh20/VocalText-Contrastive-Embedding
This repository features a CLIP-inspired contrastive model that aligns audio... |
|
Experimental |
| 73 |
easy1ive/modality-router-kit
Lightweight modality-aware query router for multimodal RAG experiments |
|
Experimental |
| 74 |
Schinkenwurst/lightmrag
Lightweight multimodal RAG baseline with late-fusion retrieval |
|
Experimental |
| 75 |
SubhamIO/Multimodal-RAG-System
Handle mixture of content types, including text, tables and images using... |
|
Experimental |
| 76 |
Bhavik-Ardeshna/Multimodal-VideoRAG
Multimodal-VideoRAG: Using BridgeTower Embeddings and Large Vision Language Models |
|
Experimental |
| 77 |
simoncampos1022/RAG-System-arXivRAG-Multimodal-Conversational
A practical, multimodal-multilingual RAG chatbot application powered by... |
|
Experimental |
| 78 |
DngBack/HPC-ColPali
Implementation of Hierarchical Patch Compression for ColPali: Efficient... |
|
Experimental |
| 79 |
Nir0g0/Multimodal-RAG
This project is a multimodal Retrieval-Augmented Generation (RAG) system... |
|
Experimental |
| 80 |
selvatharrun/Multimodal-RAG-Application
A comprehensive Multimodal Retrieval-Augmented Generation (RAG) application... |
|
Experimental |
| 81 |
robustvisrag/RobustVisRAG
CVPR26 - RobustVisRAG: Causality-Aware Vision-Based Retrieval-Augmented... |
|
Experimental |
| 82 |
Koushiki-Chakraborty/Multimodal-Question-Answering
Collaborative research exploring multimodal question answering using OCR,... |
|
Experimental |
| 83 |
neha-nambiar/Retrieval-Augmented-Multimodal-AI-for-Engineering-Homework-Solving
Engineering Homework solver using ColPali PDF retrieval, Qwen2.5-VL... |
|
Experimental |
| 84 |
dongxuecheng/SafetyVision-RAG
AI-Powered Safety Hazard Detection System using VLM and... |
|
Experimental |
| 85 |
emrekuruu/local-multimodal-personal-knowledge-base
A multi-hop multimodal RAG system to chat with your PDFs locally, using... |
|
Experimental |
| 86 |
adam-aimoscloud/MoleSearch
Multimodal data Retriever, including text, image, video, audio |
|
Experimental |
| 87 |
tph-kds/TriModalRAG_System
*Built upon the integration of text, image, and audio modalities, this... |
|
Experimental |
| 88 |
anishalle/YOLO
You Only Look Once, fine-tuned LLM + scene graph reasoning used for... |
|
Experimental |
| 89 |
prakhar175/multimodal-RAG-application
Multimodal pdf based RAG application where it scans the pdf for text and... |
|
Experimental |
| 90 |
amitkumarj441/mRAG-gim
Code for CIKM'25 paper - Multimodal RAG Enhanced Visual Description |
|
Experimental |
| 91 |
suncatchin/visual-rag
Lightweight multimodal RAG pipeline for image-and-text understanding โ CLIP... |
|
Experimental |
| 92 |
isatyamks/multimodal-rag
Multimodal RAG system for generating test cases and use cases from documents... |
|
Experimental |
| 93 |
Viviviiii/jasp-multimodal-rag
A multimodal Retrieval-Augmented Generation (RAG) system for the JASP. |
|
Experimental |
| 94 |
behradbina/ArtCognition
This repository provides the implementation of ArtCognition, a multimodal AI... |
|
Experimental |
| 95 |
Moncef-Bj/cv-papers-rag
Multimodal RAG system for Computer Vision research papers with intelligent... |
|
Experimental |
| 96 |
Shubin-vadim/Arxplover
Comprehensive multimodal system for analyzing documents with support for... |
|
Experimental |
| 97 |
MMDocRAG/MMDocIR
The code used to train and run inference with MMDocIR |
|
Experimental |
| 98 |
WizKnight/MultimodalMovieRAG
A multimodal movie search engine using RAG techniques. It allows users to... |
|
Experimental |