Multimodal Fusion Transformers Transformer Models

Tools for combining multiple input modalities (text, image, audio, video, tabular data) using transformer architectures to perform unified tasks. Does NOT include single-modality models, recommendation systems, or domain-specific applications like robotics/translation unless multimodal fusion is the primary focus.

There are 37 multimodal fusion transformers models tracked. The highest-rated is rkansal47/MPGAN at 41/100 with 13 stars.

Get all 37 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=multimodal-fusion-transformers&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 rkansal47/MPGAN

The message passing GAN https://arxiv.org/abs/2106.11535 and generative...

41
Emerging
2 dorarad/gansformer

Generative Adversarial Transformers

40
Emerging
3 j-min/VL-T5

PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)

39
Emerging
4 invictus717/MetaTransformer

Meta-Transformer for Unified Multimodal Learning

37
Emerging
5 devdhananjay14/multim

🔍 Experiment with neural networks for binary classification on multimodal...

35
Emerging
6 Yachay-AI/byt5-geotagging

Confidence and Byt5 - based geotagging model predicting coordinates from text alone.

35
Emerging
7 kyegomez/VortexFusion

Transformers + Mambas + LSTMS All in One Model

33
Emerging
8 sisinflab/Ducho

Ducho is a Python framework aimed to extract multimodal features used in...

33
Emerging
9 albrateanu/ModalFormer

[2025] ModalFormer: Multimodal Transformer for Low-Light Image Enhancement

32
Emerging
10 zinengtang/TVLT

PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)

32
Emerging
11 OFA-Sys/OFASys

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

31
Emerging
12 GT-RIPL/robo-vln

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics...

30
Emerging
13 Shanghai-Digital-Brain-Laboratory/BDM-DB1

A large-scale multi-modal pre-trained model

30
Emerging
14 GiorgiaAuroraAdorni/gansformer-reproducibility-challenge

Replication of the novel Generative Adversarial Transformer.

28
Experimental
15 Jathurshan0330/Cross-Modal-Transformer

Official repository of cross-modal transformer for interpretable automatic...

28
Experimental
16 DunnBC22/Vision_Audio_and_Multimodal_Projects

This repository includes all computer vision, audio, document AI, and...

27
Experimental
17 aws-samples/sample-for-multi-modal-document-to-json-with-sagemaker-ai

This open-source project delivers a complete pipeline for converting...

27
Experimental
18 KhoiDOO/vitvqganvae

Benchmark for Evaluating Data Reconstruction using Vector Quantization

26
Experimental
19 chasemetoyer/visual-internal-reasoning

Investigates causal visual reasoning in transformers by integrating discrete...

25
Experimental
20 wangxiao5791509/MultiModal_BigModels_Survey

[MIR-2023-Survey] A continuously updated paper list for multi-modal...

25
Experimental
21 AILab-CVC/M2PT

[CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data...

25
Experimental
22 PRITHIVSAKTHIUR/Nvidia-Cosmos-Reason1-Demo

Physical AI models understand physical common sense and generate appropriate...

25
Experimental
23 kyegomez/primus

A multimodal foundation model for humanoid robotics that integrates multiple...

24
Experimental
24 andreaceto/multimodal-crisis-classification

Multimodal Classification of Crisis-related social media contents.

24
Experimental
25 sergio-sanz-rodriguez/torchsuite

PyTorch Deep Learning Framework for Multimedia

22
Experimental
26 IsaacRodgz/multimodal-transformers-movies

Experiments with multimodal deep learning models based on transformers

21
Experimental
27 kyegomez/Multi-Model-Training

An experimental repository on research for training multiple models all at...

21
Experimental
28 mosh98/MMBT

Multi modal BiTransformer [ Reimplementation ] in Pytorch That Acutally Works !

17
Experimental
29 5seoyoung/lightweight-multimodal-healthcare-ai

[Research] Efficient multimodal transformers for clinical decision support...

16
Experimental
30 Tonks684/flow_matching_designs

Flow Matching Designs for Conditional Image Generation

15
Experimental
31 Kind-Unes/MultiModal-Model

This project is a multi-modal model that works with multiple models combined...

14
Experimental
32 jianzhnie/MultimodalTookit

Incorporate Image, Text and Tabular Data with HuggingFace Transformers

14
Experimental
33 Shreya831/multimodal-ai-visual-analyzer

Multimodal AI system that detects objects in images and answers questions...

14
Experimental
34 muanderson/Multimodal-transformer-product-matching

Repo for multimodal transformer model to product match on the Shopee Product...

11
Experimental
35 Manu-Fraile/Multimodal-Human-Robot-Feedback

A novel approach of Transformers and CNNs for Human Feedback classification

11
Experimental
36 ToshikiNakamura0412/docker_lightglue

Docker image for LightGlue

10
Experimental
37 ines312692/VoiceGan_Project

A voice conversion project using deep neural networks (CNN + Transformer +...

10
Experimental