Multimodal Fusion Transformers Transformer Models

Tools for combining multiple input modalities (text, image, audio, video, tabular data) using transformer architectures to perform unified tasks. Does NOT include single-modality models, recommendation systems, or domain-specific applications like robotics/translation unless multimodal fusion is the primary focus.

There are 37 multimodal fusion transformers models tracked. The highest-rated is rkansal47/MPGAN at 41/100 with 13 stars.

Get all 37 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=multimodal-fusion-transformers&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	rkansal47/MPGAN The message passing GAN https://arxiv.org/abs/2106.11535 and generative...	41	Emerging	13	Python
2	dorarad/gansformer Generative Adversarial Transformers	40	Emerging	1,346	Python
3	j-min/VL-T5 PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)	39	Emerging	374	Python
4	invictus717/MetaTransformer Meta-Transformer for Unified Multimodal Learning	37	Emerging	1,654	Python
5	devdhananjay14/multim 🔍 Experiment with neural networks for binary classification on multimodal...	35	Emerging	1	Python
6	Yachay-AI/byt5-geotagging Confidence and Byt5 - based geotagging model predicting coordinates from text alone.	35	Emerging	160	Python
7	kyegomez/VortexFusion Transformers + Mambas + LSTMS All in One Model	33	Emerging	14	Python
8	sisinflab/Ducho Ducho is a Python framework aimed to extract multimodal features used in...	33	Emerging	26	Python
9	albrateanu/ModalFormer [2025] ModalFormer: Multimodal Transformer for Low-Light Image Enhancement	32	Emerging	25	Python
10	zinengtang/TVLT PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)	32	Emerging	126	Jupyter Notebook
11	OFA-Sys/OFASys OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models	31	Emerging	151	Python
12	GT-RIPL/robo-vln Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics...	30	Emerging	88	Python
13	Shanghai-Digital-Brain-Laboratory/BDM-DB1 A large-scale multi-modal pre-trained model	30	Emerging	134	Python
14	GiorgiaAuroraAdorni/gansformer-reproducibility-challenge Replication of the novel Generative Adversarial Transformer.	28	Experimental	3	Dockerfile
15	Jathurshan0330/Cross-Modal-Transformer Official repository of cross-modal transformer for interpretable automatic...	28	Experimental	75	Jupyter Notebook
16	DunnBC22/Vision_Audio_and_Multimodal_Projects This repository includes all computer vision, audio, document AI, and...	27	Experimental	51	Jupyter Notebook
17	aws-samples/sample-for-multi-modal-document-to-json-with-sagemaker-ai This open-source project delivers a complete pipeline for converting...	27	Experimental	15	Jupyter Notebook
18	KhoiDOO/vitvqganvae Benchmark for Evaluating Data Reconstruction using Vector Quantization	26	Experimental	2	Python
19	chasemetoyer/visual-internal-reasoning Investigates causal visual reasoning in transformers by integrating discrete...	25	Experimental	4	Python
20	wangxiao5791509/MultiModal_BigModels_Survey [MIR-2023-Survey] A continuously updated paper list for multi-modal...	25	Experimental	291	—
21	AILab-CVC/M2PT [CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data...	25	Experimental	101	Python
22	PRITHIVSAKTHIUR/Nvidia-Cosmos-Reason1-Demo Physical AI models understand physical common sense and generate appropriate...	25	Experimental	2	Python
23	kyegomez/primus A multimodal foundation model for humanoid robotics that integrates multiple...	24	Experimental	3	—
24	andreaceto/multimodal-crisis-classification Multimodal Classification of Crisis-related social media contents.	24	Experimental	1	Jupyter Notebook
25	sergio-sanz-rodriguez/torchsuite PyTorch Deep Learning Framework for Multimedia	22	Experimental	4	Python
26	IsaacRodgz/multimodal-transformers-movies Experiments with multimodal deep learning models based on transformers	21	Experimental	11	Jupyter Notebook
27	kyegomez/Multi-Model-Training An experimental repository on research for training multiple models all at...	21	Experimental	2	Python
28	mosh98/MMBT Multi modal BiTransformer [ Reimplementation ] in Pytorch That Acutally Works !	17	Experimental	5	Jupyter Notebook
29	5seoyoung/lightweight-multimodal-healthcare-ai [Research] Efficient multimodal transformers for clinical decision support...	16	Experimental	1	Python
30	Tonks684/flow_matching_designs Flow Matching Designs for Conditional Image Generation	15	Experimental	—	Python
31	Kind-Unes/MultiModal-Model This project is a multi-modal model that works with multiple models combined...	14	Experimental	9	Python
32	jianzhnie/MultimodalTookit Incorporate Image, Text and Tabular Data with HuggingFace Transformers	14	Experimental	12	Python
33	Shreya831/multimodal-ai-visual-analyzer Multimodal AI system that detects objects in images and answers questions...	14	Experimental	—	Jupyter Notebook
34	muanderson/Multimodal-transformer-product-matching Repo for multimodal transformer model to product match on the Shopee Product...	11	Experimental	—	Jupyter Notebook
35	Manu-Fraile/Multimodal-Human-Robot-Feedback A novel approach of Transformers and CNNs for Human Feedback classification	11	Experimental	2	Python
36	ToshikiNakamura0412/docker_lightglue Docker image for LightGlue	10	Experimental	1	Python
37	ines312692/VoiceGan_Project A voice conversion project using deep neural networks (CNN + Transformer +...	10	Experimental	3	Jupyter Notebook