llm-lab-org/Multimodal-RAG-Survey

A Survey on Multimodal Retrieval-Augmented Generation

34
/ 100
Emerging

Organizes and taxonomizes papers on multimodal RAG systems across retrieval strategies (text/vision/video/audio-centric), fusion mechanisms, augmentation techniques, and generation approaches. Provides comprehensive dataset benchmarks spanning image-text, video, audio, medical, and fashion domains with evaluation metrics and training methodologies. Continuously updated resource tracking advances in cross-modal alignment, agentic interaction, and robustness for systems that ground LLM outputs in multimodal external knowledge bases.

487 stars.

No License No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 1 / 25
Community 13 / 25

How are scores calculated?

Stars

487

Forks

26

Language

License

Last pushed

Feb 20, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/llm-lab-org/Multimodal-RAG-Survey"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.