Multimodal RAG Systems RAG Tools

Tools and frameworks for retrieval-augmented generation systems that process and integrate multiple data modalities (images, text, video, audio, tables) together. Does NOT include single-modality RAG, domain-specific RAG applications, or general multimodal AI without retrieval components.

There are 98 multimodal rag systems tools tracked. 2 score above 50 (established tier). The highest-rated is AnswerDotAI/byaldi at 56/100 with 844 stars and 3,709 monthly downloads. 1 of the top 10 are actively maintained.

Get all 98 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=rag&subcategory=multimodal-rag-systems&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 AnswerDotAI/byaldi

Use late-interaction multi-modal models such as ColPali in just a few lines of code.

56
Established
2 illuin-tech/colpali

The code used to train and run inference with the ColVision models, e.g....

52
Established
3 jolibrain/colette

Multimodal RAG to search and interact locally with technical documents of any kind

44
Emerging
4 nannib/nbmultirag

Un framework in Italiano ed Inglese, che permette di chattare con i propri...

43
Emerging
5 OpenBMB/VisRAG

Parsing-free RAG supported by VLMs

42
Emerging
6 chiang-yuan/llamp

[EMNLP '25] A web app and Python API for multi-modal RAG framework to ground...

40
Emerging
7 cilabuniba/artseek

ArtSeek: Deep artwork understanding via multimodal in-context reasoning and...

40
Emerging
8 Leon1207/Video-RAG-master

โœจโœจ[NeurIPS 2025] This is the official implementation of our paper...

37
Emerging
9 JuliaGenAI/ColBERT.jl

Efficient late-interaction retrieval systems in Julia!

37
Emerging
10 tonywu71/colpali-cookbooks

Recipes for learning, fine-tuning, and adapting ColPali to your multimodal...

35
Emerging
11 ACMarcone86/artseek

ArtSeek combines late-interaction retrieval over a 5M+ multimodal corpus...

35
Emerging
12 llm-lab-org/Multimodal-RAG-Survey

A Survey on Multimodal Retrieval-Augmented Generation

34
Emerging
13 deep-div/Multimodel-RAG

Multimodal RAG ingests PDFs and generates combined text and image outputs by...

33
Emerging
14 wgcyeo/UniversalRAG

UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse...

31
Emerging
15 adithya-s-k/VARAG

Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine

30
Emerging
16 MohamedMostafa259/pif-multimodal-rag

A modular, multilingual, and multimodal Retrieval-Augmented Generation (RAG)...

30
Emerging
17 chg0901/Honor_of_Kings_Multi-modal_Dataset

A Multi-modal RAG Project with Dataset from Honor of Kings, one of the most...

28
Experimental
18 Ahmed-AI-01/Multimodal-RAG

An AI-powered chat application using text, audio, and images for...

27
Experimental
19 pranshuchaurasia/image-indexing-and-retrival-with-qdrant

The repo provides the code for Qdrant for efficient image indexing and...

27
Experimental
20 the-bird-F/GLM-Voice-RAG

[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end...

27
Experimental
21 richard-peng-xia/RULE

[EMNLP'24] RULE: Reliable Multimodal RAG for Factuality in Medical Vision...

26
Experimental
22 joohyung00/lilac

This is the public repository for "LILaC: Late Interacting in Layered...

26
Experimental
23 zhaosuifeng/FinRAGBench-V

FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the...

24
Experimental
24 cany7/LumiCite

LumiCite is a multimodal RAG system for academic papers, designed for...

24
Experimental
25 GenCEO/mm-rag-playbook

Lightweight multimodal RAG patterns for PDF-like documents

23
Experimental
26 ChaoLinAViy/OMGM

OMGM: Orchestrate Multiple Granularities and Modalities for Efficient...

23
Experimental
27 AhmedAl93/multimodal-semantic-RAG

A RAG system designed to process documents with multimodal content. It can...

23
Experimental
28 Hoar012/RAP-MLLM

[CVPR 2025] RAP: Retrieval-Augmented Personalization

23
Experimental
29 connectpool/multimodal-rag-lab

Compact multimodal RAG baseline with chunking, BM25 retrieval and prompt assembly.

22
Experimental
30 dame-cell/VisionRAG

A new novel multi-modality (Vision) RAG architecture

22
Experimental
31 DataFog/vlm-api

REST API for computing cross-modal similarity between images and text using...

22
Experimental
32 RecSys-lab/RAG-VisualRec

๐Ÿง  A Resource for Multi-Modal Learning in Visual RAGs

22
Experimental
33 ResearchAgents/multimodal-doc-rag

A lightweight pipeline for multimodal document retrieval and QA using...

22
Experimental
34 medazizsaaadallah/Knowledge-Infused-Multimodal-Retrieval-A-RAG-Based-Approach-for-Context-Aware-Image-Understanding

๐ŸŒŸ Enhance image understanding through a RAG-based approach, combining...

22
Experimental
35 Devanik21/xylia-vision

Vision transformer-powered knowledge extraction. Analyze any image:...

22
Experimental
36 alilooop/AssetRetrieval3D

๐ŸŒ Retrieve 3D assets effortlessly using text or images with this multi-modal...

22
Experimental
37 RodneyFinkel/groq_deepgram_agent

Multi Modal Agent using Deepgram and Groq LPU's and Sentence Transformers...

21
Experimental
38 aimagelab/ReT-2

Recurrence Meets Transformers for Universal Multimodal Retrieval

21
Experimental
39 Rayen-Hamza/Klippy

A text-centric multimodal local first RAG system with knowledge graph...

21
Experimental
40 santiago68310/RAG-based-multimodal-agent

A sophisticated Retrieval-Augmented Generation (RAG) system that combines...

21
Experimental
41 SnowNation101/Nyx

Code for the paper โ€œTowards Mixed-Modal Retrieval for Universal...

21
Experimental
42 Azure-Samples/multimodal_rag_python

Python notebook for solving overlapping tables problem with Azure document...

20
Experimental
43 DuhanJishnu/NeuraNexus

Offline Multimodal RAG System for Unified Retrieval from Text, Image, and Audio Data

20
Experimental
44 seth-woo/mkrs-optional-memory

Multimodal Knowledge Retrieval System with Optional Memory (MKRS)

20
Experimental
45 kyopark2014/llm-multimodal-and-rag

It shows how to use mutimodal and RAG based on multi-region LLM.

20
Experimental
46 aniketpoojari/Enterprise-AI-Assistant-MCP

Production-grade Multi-Modal RAG system for intelligent document Q&A with...

19
Experimental
47 SainathPattipati/multi-modal-rag

RAG over images, PDFs, tables, and structured data โ€” unified retrieval...

19
Experimental
48 nicolas-len/gcp-multimodal-ai-rag

Multimodal AI knowledge base, RAG on GCP with Gemini parsing, BigQuery...

19
Experimental
49 Alijanloo/MultiModalRag

A Multi-Modal Agentic RAG pipeline designed to handle unstructured documents...

18
Experimental
50 THE-S0HAM/OmniWhale-RAG

Generalized, Offline-First Multimodal AI System

18
Experimental
51 naimkatiman/Multi-Modal-RAG-Pipeline-on-Images-and-Text-Locally

My first Multi-Modal RAG pipeline....Dummy version

18
Experimental
52 MMDocRAG/MMDocRAG

The code used to train and run inference with MMDocRAG

18
Experimental
53 forfrt/vgsg_rag

Visual Grounded Story Generation with RAG

17
Experimental
54 starsuzi/VideoRAG

VideoRAG: Retrieval-Augmented Generation over Video Corpus

16
Experimental
55 RazerArdi/Knowledge-Infused-Multimodal-Retrieval-A-RAG-Based-Approach-for-Context-Aware-Image-Understanding

A modular RAG-based framework for image retrieval and context-aware...

16
Experimental
56 Ashutosh-AIBOT/multimodal-rag-research-assistant

Multi-source RAG assistant โ€” chat with PDFs, research YouTube channels,...

16
Experimental
57 Ghost-141/Multi-Modal-Local-RAG

A Multi-Modal RAG Pipeline with Local LLMs

15
Experimental
58 SungJuyeon/multimodal_RAG_System

์ด๋ฏธ์ง€, ์˜์ƒ์„ ์—…๋กœ๋“œํ•˜์—ฌ ์งˆ์˜์‘๋‹ตํ•˜๋Š” ์‹œ์Šคํ…œ

15
Experimental
59 TioeAre/BayesRAG

BayesRAG: Probabilistic Mutual Evidence Corroboration for Multimodal...

15
Experimental
60 AliHamzaAzam/multimodal-rag

Multimodal RAG system with CLIP embeddings, FAISS search, and MLX-powered Mistral LLM

15
Experimental
61 muthusamir/GraphMultimodalRAG

Enhancing Vision-Language Retrieval with Graph-Based and Multimodal RAG Integration

15
Experimental
62 Arnav000/Multimodal-RAG

This repository contains a full-stack Multimodal Retrieval-Augmented...

15
Experimental
63 jiangnanboy/pdf_multimodal_rag

pdf multimodal rag ใ€pdfๅคšๆจกๆ€rag้—ฎ็ญ”ใ€‘

15
Experimental
64 sakshamVerma08/MultiModal-RAG-Practice-

Multi-Modal RAG: Retrieval-Augmented Generation over Text and Visual PDFs A...

15
Experimental
65 rutvik29/multimodal-rag

Production multimodal RAG pipeline: ingests PDFs, images, and tables with...

14
Experimental
66 jthiruveedula/multimodal-rag-pipeline

End-to-end Multimodal RAG pipeline ingesting PDFs, images, and audio using...

14
Experimental
67 RitamPatra/rag-project

Multimodal RAG chatbot

14
Experimental
68 AnithaKarre/multimodel_RAG

Multimodal RAG pipeline that ingests PDFs, Word docs, CSVs, Excel files, and...

14
Experimental
69 sgxs2014/mmrag-toolkit

A minimal toolkit for Multimodal RAG โ€” retrieve images and text, ground...

14
Experimental
70 CKeibel/FHSWF-deep-learning

Multimodal RAG and comparisons between language models. (Project for Deep...

14
Experimental
71 id4thomas/psi-king

Framework for building Multimodal Document Retrievers

14
Experimental
72 jeswanthmukesh20/VocalText-Contrastive-Embedding

This repository features a CLIP-inspired contrastive model that aligns audio...

14
Experimental
73 easy1ive/modality-router-kit

Lightweight modality-aware query router for multimodal RAG experiments

14
Experimental
74 Schinkenwurst/lightmrag

Lightweight multimodal RAG baseline with late-fusion retrieval

14
Experimental
75 SubhamIO/Multimodal-RAG-System

Handle mixture of content types, including text, tables and images using...

13
Experimental
76 Bhavik-Ardeshna/Multimodal-VideoRAG

Multimodal-VideoRAG: Using BridgeTower Embeddings and Large Vision Language Models

13
Experimental
77 simoncampos1022/RAG-System-arXivRAG-Multimodal-Conversational

A practical, multimodal-multilingual RAG chatbot application powered by...

13
Experimental
78 DngBack/HPC-ColPali

Implementation of Hierarchical Patch Compression for ColPali: Efficient...

13
Experimental
79 Nir0g0/Multimodal-RAG

This project is a multimodal Retrieval-Augmented Generation (RAG) system...

13
Experimental
80 selvatharrun/Multimodal-RAG-Application

A comprehensive Multimodal Retrieval-Augmented Generation (RAG) application...

12
Experimental
81 robustvisrag/RobustVisRAG

CVPR26 - RobustVisRAG: Causality-Aware Vision-Based Retrieval-Augmented...

12
Experimental
82 Koushiki-Chakraborty/Multimodal-Question-Answering

Collaborative research exploring multimodal question answering using OCR,...

12
Experimental
83 neha-nambiar/Retrieval-Augmented-Multimodal-AI-for-Engineering-Homework-Solving

Engineering Homework solver using ColPali PDF retrieval, Qwen2.5-VL...

12
Experimental
84 dongxuecheng/SafetyVision-RAG

AI-Powered Safety Hazard Detection System using VLM and...

12
Experimental
85 emrekuruu/local-multimodal-personal-knowledge-base

A multi-hop multimodal RAG system to chat with your PDFs locally, using...

12
Experimental
86 adam-aimoscloud/MoleSearch

Multimodal data Retriever, including text, image, video, audio

12
Experimental
87 tph-kds/TriModalRAG_System

*Built upon the integration of text, image, and audio modalities, this...

12
Experimental
88 anishalle/YOLO

You Only Look Once, fine-tuned LLM + scene graph reasoning used for...

11
Experimental
89 prakhar175/multimodal-RAG-application

Multimodal pdf based RAG application where it scans the pdf for text and...

11
Experimental
90 amitkumarj441/mRAG-gim

Code for CIKM'25 paper - Multimodal RAG Enhanced Visual Description

11
Experimental
91 suncatchin/visual-rag

Lightweight multimodal RAG pipeline for image-and-text understanding โ€” CLIP...

11
Experimental
92 isatyamks/multimodal-rag

Multimodal RAG system for generating test cases and use cases from documents...

11
Experimental
93 Viviviiii/jasp-multimodal-rag

A multimodal Retrieval-Augmented Generation (RAG) system for the JASP.

11
Experimental
94 behradbina/ArtCognition

This repository provides the implementation of ArtCognition, a multimodal AI...

11
Experimental
95 Moncef-Bj/cv-papers-rag

Multimodal RAG system for Computer Vision research papers with intelligent...

11
Experimental
96 Shubin-vadim/Arxplover

Comprehensive multimodal system for analyzing documents with support for...

11
Experimental
97 MMDocRAG/MMDocIR

The code used to train and run inference with MMDocIR

10
Experimental
98 WizKnight/MultimodalMovieRAG

A multimodal movie search engine using RAG techniques. It allows users to...

10
Experimental