NVlabs/MambaVision

[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone

/ 100

Established

Combines State Space Models (SSM) with self-attention in a hierarchical architecture, introducing a symmetric mixer block without SSM to improve global context modeling. Supports arbitrary input resolutions and provides multi-scale hierarchical features across four stages for downstream tasks like detection and segmentation. Integrates with Hugging Face and timm ecosystems, available via pip package with pretrained weights on ImageNet-1K and ImageNet-21K.

2,060 stars. Actively maintained with 4 commits in the last 30 days. Available on PyPI.

Maintenance 16 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 18 / 25

How are scores calculated?

Stars

2,060

Forks

129

Language

Python

License

—

Related models

sign-language-translator/sign-language-translator

Python library & framework to build custom translators for the hearing-impaired and translate...

kyegomez/Jamba

PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"

fashn-AI/fashn-human-parser

Human parsing model for fashion and virtual try-on applications

autonomousvision/transfuser

[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving;...

kyegomez/MultiModalMamba

A novel implementation of fusing ViT with Mamba into a fast, agile, and high performance...

Explore Transformer Models

All categories Trending Transformer directory Insights