BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Comprehensive curated repository of research papers, datasets, and benchmarks covering multimodal LLM advances across instruction tuning, hallucination mitigation, and reasoning tasks. Features proprietary evaluation frameworks (MME, Video-MME, MME-RealWorld) and the VITA series of omni-modal models supporting real-time vision-speech interaction and embodied reasoning. Targets the broader MLLM research ecosystem with extensive documentation of 750+ references and curated resources for model development and evaluation.
17,448 stars. Actively maintained with 7 commits in the last 30 days.
Stars
17,448
Forks
1,112
Language
—
License
—
Category
Last pushed
Mar 12, 2026
Commits (30d)
7
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/BradyFU/Awesome-Multimodal-Large-Language-Models"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related models
FoundationVision/Liquid
(Accepted by IJCV) Liquid: Language Models are Scalable and Unified Multi-modal Generators
Paranioar/Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification),...
Yangyi-Chen/Multimodal-AND-Large-Language-Models
Paper list about multimodal and large language models, only used to record papers I read in the...
thuml/AutoTimes
Official implementation for "AutoTimes: Autoregressive Time Series Forecasters via Large Language Models"
flixpar/med-ts-llm
MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis