invictus717/MetaTransformer

Meta-Transformer for Unified Multimodal Learning

/ 100

Emerging

Implements a shared-encoder architecture with modality-agnostic "Data-to-Sequence" tokenization that unifies 12 diverse data types (text, images, point clouds, audio, video, medical/hyperspectral/infrared imagery, graphs, tabular, time-series, IMU) into a single transformer backbone. Supports unpaired multimodal training and downstream task-specific heads for classification, detection, and segmentation, with pretrained weights available on LAION-2B and compatible with Hugging Face and OpenXLab ecosystems.

1,654 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

1,654

Forks

117

Language

Python

License

Apache-2.0

Higher-rated alternatives

dorarad/gansformer

Generative Adversarial Transformers

j-min/VL-T5

PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)

Yachay-AI/byt5-geotagging

Confidence and Byt5 - based geotagging model predicting coordinates from text alone.

zinengtang/TVLT

PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)

OFA-Sys/OFASys

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

Explore Transformer Models

All categories Trending Transformer directory Insights