Leon1207/Video-RAG-master

✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"

37
/ 100
Emerging

Combines retrieval-augmented generation with multimodal auxiliary text extraction (OCR, ASR, object detection via open-source tools like EasyOCR and spaCy) to augment long video understanding in LVLMs. The training-free, plug-and-play pipeline integrates with LLaVA-NeXT and other video language models, using FAISS for efficient text retrieval to inject visually-aligned context into model inference without requiring commercial APIs.

404 stars.

No License No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 1 / 25
Community 16 / 25

How are scores calculated?

Stars

404

Forks

39

Language

Python

License

Last pushed

Jan 14, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/Leon1207/Video-RAG-master"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.