Multimodal Vision Language Models LLM Tools
Comprehensive surveys, benchmarks, and research collections on vision-language models, multimodal learning architectures, and their domain-specific applications (remote sensing, transportation, urban computing, weather). Does NOT include individual model implementations, fine-tuning techniques, or tools for building applications with these models.
There are 43 multimodal vision language models tools tracked. 1 score above 50 (established tier). The highest-rated is hijkzzz/Awesome-LLM-Strawberry at 50/100 with 6,896 stars. 1 of the top 10 are actively maintained.
Get all 43 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=multimodal-vision-language-models&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
hijkzzz/Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓... |
|
Established |
| 2 |
chrisliu298/awesome-llm-unlearning
A resource repository for machine unlearning in large language models |
|
Emerging |
| 3 |
worldbench/awesome-spatial-intelligence
🌐 Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training... |
|
Emerging |
| 4 |
worldbench/awesome-vla-for-ad
🌐 Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future |
|
Emerging |
| 5 |
zjukg/KG-MM-Survey
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey |
|
Emerging |
| 6 |
sou350121/VLA-Handbook
本项目旨在为致力于进入VLA(Vision-Language-Action)领域的算法工程师提供一份全中文、实战导向的学习/面试手册。 不同于通用的... |
|
Emerging |
| 7 |
RManLuo/Awesome-LLM-KG
Awesome papers about unifying LLMs and KGs |
|
Emerging |
| 8 |
worldbench/DriveBench
[ICCV 2025] Are VLMs Ready for Autonomous Driving? An Empirical Study from... |
|
Emerging |
| 9 |
PeterGriffinJin/Awesome-Language-Model-on-Graphs
A curated list of papers and resources based on "Large Language Models on... |
|
Emerging |
| 10 |
he-h/rhythm
[NeurIPS 2025] RHYTHM: Reasoning with Hierarchical Temporal Tokenization for... |
|
Emerging |
| 11 |
EmulationAI/awesome-large-audio-models
Collection of resources on the applications of Large Language Models (LLMs)... |
|
Emerging |
| 12 |
MIT-SPARK/LP2
Long-term Human Trajectory Prediction using 3D DSGs |
|
Emerging |
| 13 |
llmbev/talk2bev
Talk2BEV: Language-Enhanced Bird's Eye View Maps (ICRA'24) |
|
Emerging |
| 14 |
PJLab-ADG/awesome-knowledge-driven-AD
A curated list of awesome knowledge-driven autonomous driving (continually updated) |
|
Emerging |
| 15 |
THUMNLab/awesome-large-graph-model
Papers about large graph models. |
|
Emerging |
| 16 |
WLiK/LLM4Rec-Awesome-Papers
A list of awesome papers and resources of recommender system on large... |
|
Emerging |
| 17 |
SuperBruceJia/Awesome-Large-Vision-Language-Model
Awesome Large Vision-Language Model: A Curated List of Large Vision-Language Model |
|
Experimental |
| 18 |
LJungang/Awesome-Video-Reasoning-Landscape
🔥An open-source survey of the latest video reasoning tasks, paradigms, and... |
|
Experimental |
| 19 |
tongnie/awesome-llm4tr
Exploring the Roles of Large Language Models in Reshaping Transportation... |
|
Experimental |
| 20 |
NotYuSheng/Multimodal-Large-Language-Model
Localized Multimodal Large Language Model (MLLM) integrated with Streamlit... |
|
Experimental |
| 21 |
vincentlux/Awesome-Multimodal-LLM
Reading list for Multimodal Large Language Models |
|
Experimental |
| 22 |
basiclab/TTSG
Traffic Scene Generation from Natural Language Description for Autonomous... |
|
Experimental |
| 23 |
Xiaohao-Liu/Awesome-Multi-Token-Prediction
A curated list of papers, tools, and resources on Multi-Token Prediction... |
|
Experimental |
| 24 |
cocacola-lab/Awesome-Transformer-in-Transportation
Papers & resources linked to Transformer-based research mainly for... |
|
Experimental |
| 25 |
archersama/awesome-recommend-system-pretraining-papers
Paper List for Recommend-system PreTrained Models |
|
Experimental |
| 26 |
OpenTSLab/TimeOmni
[ICLR 2026] Official implementation of SciTS: Scientific Time Series... |
|
Experimental |
| 27 |
Atomic-man007/Awesome_Multimodel_LLM
Awesome_Multimodel is a curated GitHub repository that provides a... |
|
Experimental |
| 28 |
YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image,... |
|
Experimental |
| 29 |
westlake-repl/MicroLens
A Large Short-video Recommendation Dataset with Raw Text/Audio/Image/Videos... |
|
Experimental |
| 30 |
Nehs6xy3hgdguzjs/Awesome-Video-Reasoning
🎥 Explore cutting-edge research focused on reasoning with video models,... |
|
Experimental |
| 31 |
edujbarrios/awesome-vision-ai-stack
A curated, builder-first list of Vision Language Models (VLMs), local... |
|
Experimental |
| 32 |
davendw49/Awesome-Long-Context-Language-Modeling
Papers of Long Context Language Model |
|
Experimental |
| 33 |
tongnie/IMPEL
TRE'25: Joint Estimation and Prediction of City-wide Delivery Demand: A... |
|
Experimental |
| 34 |
AdityaLab/MM4TSA
A professional list on Multi-Modalities For Time Series Analysis (MM4TSA)... |
|
Experimental |
| 35 |
thetuantrinh/Radar-Language-Models-Survey
Survey of Radar–Language Models for semantic radar perception and reasoning. |
|
Experimental |
| 36 |
ThomasVonWu/Awesome-VLMs-Strawberry
A collection of VLMs papers, blogs, and projects, with a focus on VLMs in... |
|
Experimental |
| 37 |
Tangkfan/Awesome-Temporal-Video-Grounding
paper list on Video Moment Retrieval (VMR), or Temporal Video Grounding... |
|
Experimental |
| 38 |
bailynlove/Awesome-OCR-Vision-Based-Context-Compression
Awesome list of paper on vision-based context compression |
|
Experimental |
| 39 |
showlab/Awesome-Long-Context
A curated list of resources about long-context in large-language models and... |
|
Experimental |
| 40 |
leo038/robot_manipulation_survey
机械臂抓取工作汇总调研。 |
|
Experimental |
| 41 |
chrisliu298/awesome-sparse-autoencoders
A resource repository of sparse autoencoders for large language models |
|
Experimental |
| 42 |
xiexukang/awesome-speech-resources
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis,... |
|
Experimental |
| 43 |
HKUDS/Awesome-LLM4Urban-Papers
[ACM TIST] "LLM4Urban: Urban Computing in the Era of Large Language Models" |
|
Experimental |