Multimodal Fusion Transformers Transformer Models
Tools for combining multiple input modalities (text, image, audio, video, tabular data) using transformer architectures to perform unified tasks. Does NOT include single-modality models, recommendation systems, or domain-specific applications like robotics/translation unless multimodal fusion is the primary focus.
There are 37 multimodal fusion transformers models tracked. The highest-rated is rkansal47/MPGAN at 41/100 with 13 stars.
Get all 37 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=multimodal-fusion-transformers&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
rkansal47/MPGAN
The message passing GAN https://arxiv.org/abs/2106.11535 and generative... |
|
Emerging |
| 2 |
dorarad/gansformer
Generative Adversarial Transformers |
|
Emerging |
| 3 |
j-min/VL-T5
PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021) |
|
Emerging |
| 4 |
invictus717/MetaTransformer
Meta-Transformer for Unified Multimodal Learning |
|
Emerging |
| 5 |
devdhananjay14/multim
🔍 Experiment with neural networks for binary classification on multimodal... |
|
Emerging |
| 6 |
Yachay-AI/byt5-geotagging
Confidence and Byt5 - based geotagging model predicting coordinates from text alone. |
|
Emerging |
| 7 |
kyegomez/VortexFusion
Transformers + Mambas + LSTMS All in One Model |
|
Emerging |
| 8 |
sisinflab/Ducho
Ducho is a Python framework aimed to extract multimodal features used in... |
|
Emerging |
| 9 |
albrateanu/ModalFormer
[2025] ModalFormer: Multimodal Transformer for Low-Light Image Enhancement |
|
Emerging |
| 10 |
zinengtang/TVLT
PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral) |
|
Emerging |
| 11 |
OFA-Sys/OFASys
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models |
|
Emerging |
| 12 |
GT-RIPL/robo-vln
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics... |
|
Emerging |
| 13 |
Shanghai-Digital-Brain-Laboratory/BDM-DB1
A large-scale multi-modal pre-trained model |
|
Emerging |
| 14 |
GiorgiaAuroraAdorni/gansformer-reproducibility-challenge
Replication of the novel Generative Adversarial Transformer. |
|
Experimental |
| 15 |
Jathurshan0330/Cross-Modal-Transformer
Official repository of cross-modal transformer for interpretable automatic... |
|
Experimental |
| 16 |
DunnBC22/Vision_Audio_and_Multimodal_Projects
This repository includes all computer vision, audio, document AI, and... |
|
Experimental |
| 17 |
aws-samples/sample-for-multi-modal-document-to-json-with-sagemaker-ai
This open-source project delivers a complete pipeline for converting... |
|
Experimental |
| 18 |
KhoiDOO/vitvqganvae
Benchmark for Evaluating Data Reconstruction using Vector Quantization |
|
Experimental |
| 19 |
chasemetoyer/visual-internal-reasoning
Investigates causal visual reasoning in transformers by integrating discrete... |
|
Experimental |
| 20 |
wangxiao5791509/MultiModal_BigModels_Survey
[MIR-2023-Survey] A continuously updated paper list for multi-modal... |
|
Experimental |
| 21 |
AILab-CVC/M2PT
[CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data... |
|
Experimental |
| 22 |
PRITHIVSAKTHIUR/Nvidia-Cosmos-Reason1-Demo
Physical AI models understand physical common sense and generate appropriate... |
|
Experimental |
| 23 |
kyegomez/primus
A multimodal foundation model for humanoid robotics that integrates multiple... |
|
Experimental |
| 24 |
andreaceto/multimodal-crisis-classification
Multimodal Classification of Crisis-related social media contents. |
|
Experimental |
| 25 |
sergio-sanz-rodriguez/torchsuite
PyTorch Deep Learning Framework for Multimedia |
|
Experimental |
| 26 |
IsaacRodgz/multimodal-transformers-movies
Experiments with multimodal deep learning models based on transformers |
|
Experimental |
| 27 |
kyegomez/Multi-Model-Training
An experimental repository on research for training multiple models all at... |
|
Experimental |
| 28 |
mosh98/MMBT
Multi modal BiTransformer [ Reimplementation ] in Pytorch That Acutally Works ! |
|
Experimental |
| 29 |
5seoyoung/lightweight-multimodal-healthcare-ai
[Research] Efficient multimodal transformers for clinical decision support... |
|
Experimental |
| 30 |
Tonks684/flow_matching_designs
Flow Matching Designs for Conditional Image Generation |
|
Experimental |
| 31 |
Kind-Unes/MultiModal-Model
This project is a multi-modal model that works with multiple models combined... |
|
Experimental |
| 32 |
jianzhnie/MultimodalTookit
Incorporate Image, Text and Tabular Data with HuggingFace Transformers |
|
Experimental |
| 33 |
Shreya831/multimodal-ai-visual-analyzer
Multimodal AI system that detects objects in images and answers questions... |
|
Experimental |
| 34 |
muanderson/Multimodal-transformer-product-matching
Repo for multimodal transformer model to product match on the Shopee Product... |
|
Experimental |
| 35 |
Manu-Fraile/Multimodal-Human-Robot-Feedback
A novel approach of Transformers and CNNs for Human Feedback classification |
|
Experimental |
| 36 |
ToshikiNakamura0412/docker_lightglue
Docker image for LightGlue |
|
Experimental |
| 37 |
ines312692/VoiceGan_Project
A voice conversion project using deep neural networks (CNN + Transformer +... |
|
Experimental |