invictus717/MetaTransformer
Meta-Transformer for Unified Multimodal Learning
Implements a shared-encoder architecture with modality-agnostic "Data-to-Sequence" tokenization that unifies 12 diverse data types (text, images, point clouds, audio, video, medical/hyperspectral/infrared imagery, graphs, tabular, time-series, IMU) into a single transformer backbone. Supports unpaired multimodal training and downstream task-specific heads for classification, detection, and segmentation, with pretrained weights available on LAION-2B and compatible with Hugging Face and OpenXLab ecosystems.
1,654 stars. No commits in the last 6 months.
Stars
1,654
Forks
117
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 05, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/invictus717/MetaTransformer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
dorarad/gansformer
Generative Adversarial Transformers
j-min/VL-T5
PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)
Yachay-AI/byt5-geotagging
Confidence and Byt5 - based geotagging model predicting coordinates from text alone.
zinengtang/TVLT
PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)
OFA-Sys/OFASys
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models