pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
Curated collection of 200+ research papers organized across 18 core technical areas (multimodal fusion, alignment, pretraining, translation, retrieval) plus applications spanning vision-language, audio, robotics, and autonomous driving. Includes benchmark datasets, tutorial slides/videos from CMU courses, and links to open-source implementations for foundational models like CLIP and ViLBERT. Maintained by CMU's MultiComp Lab with community contributions.
6,835 stars. No commits in the last 6 months.
Stars
6,835
Forks
897
Language
—
License
MIT
Category
Last pushed
Aug 20, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/pliang279/awesome-multimodal-ml"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
friedrichor/Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
KaiyangZhou/pytorch-vsumm-reinforce
Unsupervised video summarization with deep reinforcement learning (AAAI'18)
adambielski/siamese-triplet
Siamese and triplet networks with online pair/triplet mining in PyTorch