Awesome-Multimodal-Large-Language-Models and Awesome-VLA
A is a comprehensive collection of resources on multimodal large language models, including Vision Language Action (VLA) models, making B, which specifically focuses on VLA advancements, a specialized subset or a more focused alternative to A within the broader multimodal LLM ecosystem.
About Awesome-Multimodal-Large-Language-Models
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Comprehensive curated repository of research papers, datasets, and benchmarks covering multimodal LLM advances across instruction tuning, hallucination mitigation, and reasoning tasks. Features proprietary evaluation frameworks (MME, Video-MME, MME-RealWorld) and the VITA series of omni-modal models supporting real-time vision-speech interaction and embodied reasoning. Targets the broader MLLM research ecosystem with extensive documentation of 750+ references and curated resources for model development and evaluation.
About Awesome-VLA
Orlando-CS/Awesome-VLA
✨✨latest advancements in VLA models(VIsion Language Action)
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work