facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Built on PyTorch with distributed training support, MMF provides reference implementations of state-of-the-art vision-language models and includes pre-built datasets for VQA, TextVQA, TextCaps, and Hateful Memes challenges. The framework emphasizes modularity and scalability, enabling researchers to bootstrap multimodal projects through composable model and dataset components. It serves dual purposes as both a research platform and official starter codebase for vision-language benchmarks.
5,622 stars. Actively maintained with 2 commits in the last 30 days.
Stars
5,622
Forks
944
Language
Python
License
—
Category
Last pushed
Jan 12, 2026
Commits (30d)
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/facebookresearch/mmf"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
adambielski/siamese-triplet
Siamese and triplet networks with online pair/triplet mining in PyTorch
pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
friedrichor/Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.