3D Vision Transformers Transformer Models
Tools for 3D computer vision tasks using transformers, including depth estimation, multi-view geometry, structure-from-motion, point cloud processing, 3D pose estimation, and novel view synthesis. Does NOT include general 2D vision tasks, 2D pose estimation, or 3D shape generation without vision inputs.
There are 83 3d vision transformers models tracked. 5 score above 50 (established tier). The highest-rated is NVlabs/MambaVision at 69/100 with 2,060 stars. 1 of the top 10 are actively maintained.
Get all 83 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=3d-vision-transformers&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
NVlabs/MambaVision
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid... |
|
Established |
| 2 |
sign-language-translator/sign-language-translator
Python library & framework to build custom translators for the... |
|
Established |
| 3 |
kyegomez/Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model" |
|
Established |
| 4 |
fashn-AI/fashn-human-parser
Human parsing model for fashion and virtual try-on applications |
|
Established |
| 5 |
autonomousvision/transfuser
[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for... |
|
Established |
| 6 |
kyegomez/MultiModalMamba
A novel implementation of fusing ViT with Mamba into a fast, agile, and high... |
|
Emerging |
| 7 |
dali92002/DocEnTR
DocEnTr: An end-to-end document image enhancement transformer - ICPR 2022 |
|
Emerging |
| 8 |
buaacyw/MeshAnything
[ICLR 2025] From anything to mesh like human artists. Official impl. of... |
|
Emerging |
| 9 |
buaacyw/MeshAnythingV2
[ICCV 2025] From anything to mesh like human artists. Official impl. of... |
|
Emerging |
| 10 |
linjieli222/HERO
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for... |
|
Emerging |
| 11 |
wgcban/HyperTransformer
[CVPR'22] HyperTransformer: A Textural and Spectral Feature Fusion... |
|
Emerging |
| 12 |
PediaMedAI/AggPose
[IJCAI 2022] Official PyTorch implementation of AggPose: Deep Aggregation... |
|
Emerging |
| 13 |
AllenXiangX/SnowflakeNet
(TPAMI 2023) Snowflake Point Deconvolution for Point Cloud Completion and... |
|
Emerging |
| 14 |
padeler/PE-former
2D Human Pose estimation using transformers. Implementation in Pytorch |
|
Emerging |
| 15 |
AyushExel/trolo
An SDK for Transformers + YOLO and other SSD family models |
|
Emerging |
| 16 |
ChenRocks/UNITER
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt... |
|
Emerging |
| 17 |
xingyizhou/GTR
Global Tracking Transformers, CVPR 2022 |
|
Emerging |
| 18 |
hasanirtiza/PedesFormer-Transformer-Networks-For-Pedestrian-Detection
Transformer Networks for Pedestrian Detection |
|
Emerging |
| 19 |
icon-lab/SLATER
Official implementation of the paper: Unsupervised MRI Reconstruction via... |
|
Emerging |
| 20 |
jhcho99/CoFormer
[CVPR'22] Official PyTorch Implementation of "Collaborative Transformers for... |
|
Emerging |
| 21 |
desaixie/zeroverse
Official code for NeurIPS 2024 paper LRM-Zero: Training Large Reconstruction... |
|
Emerging |
| 22 |
csiro-robotics/HOTFormerLoc
[IEEE/CVF CVPR 2025] Hierarchical Octree Transformer for Versatile Lidar... |
|
Emerging |
| 23 |
cgtuebingen/ua3dscancomp
Latent Uncertainty-Aware Multi-View SDF Scan Completion |
|
Emerging |
| 24 |
yihongXU/TransCenter
This is the official implementation of TransCenter (TPAMI). The code and... |
|
Emerging |
| 25 |
snktshrma/ngps_flight
Global vision positioning system for UAVs in outdoor GNSS-denied environments |
|
Emerging |
| 26 |
jhcho99/GSRTR
[BMVC'21] Official PyTorch Implementation of "Grounded Situation Recognition... |
|
Emerging |
| 27 |
kyegomez/AudioMamba
Implementation of the paper: "Audio Mamba: Bidirectional State Space Model... |
|
Emerging |
| 28 |
XunshanMan/MVGFormer
This is the official implementation of the work presented at CVPR 2024,... |
|
Emerging |
| 29 |
zubair-irshad/NeRF-MAE
[ECCV 2024] Pytorch code for our ECCV'24 paper NeRF-MAE: Masked AutoEncoders... |
|
Emerging |
| 30 |
xmartlabs/spoter-embeddings
Create embeddings from sign pose videos using Transformers |
|
Emerging |
| 31 |
kyegomez/MambaDecoderBlock
MambaDecoderBlock is a novel decoder architecture that replaces traditional... |
|
Emerging |
| 32 |
VachanVY/Transfusion.torch
PyTorch Implementation of Transfusion: Predict the Next Token and Diffuse... |
|
Emerging |
| 33 |
hukenovs/slovo
Slovo: Russian Sign Language Dataset and Models |
|
Emerging |
| 34 |
eslambakr/LAR-Look-Around-and-Refer
This is the official implementation for our paper;"LAR:Look Around and Refer". |
|
Experimental |
| 35 |
tthinking/MATR
[IEEE TIP 2022] Official implementation of MATR: Multimodal Medical Image... |
|
Experimental |
| 36 |
sauradip/STALE
[ECCV 2022] Official Pytorch Implementation of the paper : " Zero-Shot... |
|
Experimental |
| 37 |
Warren-SJ/SLAM3R
A study of the research paper SLAM3R:Real-Time Dense Scene Reconstruction... |
|
Experimental |
| 38 |
DEV-D-GR8/SignSense
This repository contains a transformer-based model for real-time American... |
|
Experimental |
| 39 |
sam575/axial-gan
Code for "Simultaneous Face Hallucination and Translation for Thermal to... |
|
Experimental |
| 40 |
kyegomez/VLM-Mamba
We introduce VLM-Mamba, the first Vision-Language Model built entirely on... |
|
Experimental |
| 41 |
kyegomez/SimpleMamba
Implementation of a modular, high-performance, and simplistic mamba for... |
|
Experimental |
| 42 |
AndrewBoessen/PerfectRep
PerfectRep is a 3D pose estimation model tailored specifically for... |
|
Experimental |
| 43 |
ShengcaiLiao/TransMatcher
[NeurIPS 2021] TransMatcher: Deep Image Matching Through Transformers for... |
|
Experimental |
| 44 |
Suvroneel/ToyKing
A Python prototype that converts 2D photos or text prompts into 3D models... |
|
Experimental |
| 45 |
bhanuprathap2000/sign-language-recognition
This repo contains the code for sign-language-recognition as part of our... |
|
Experimental |
| 46 |
Merterm/Modeling-Intensification-for-SLG
Public repo for the paper: "Modeling Intensification for Sign Language... |
|
Experimental |
| 47 |
NeurAI-Lab/MT-SfMLearner
Official code for 'Transformers in Unsupervised Structure-from-Motion' and... |
|
Experimental |
| 48 |
GregorKobsik/ImageTransformer
This notebook shows a basic implementation of a transformer (decoder)... |
|
Experimental |
| 49 |
kyegomez/Simba
A simpler Pytorch + Zeta Implementation of the paper: "SiMBA: Simplified... |
|
Experimental |
| 50 |
xiuqhou/DAPE
[AAAI2026] Official implementation of the paper "DAPE: Harmonizing... |
|
Experimental |
| 51 |
lamm-mit/FieldCompleter
GAN/convolutional and Transformer models to predict missing mechanical... |
|
Experimental |
| 52 |
loubnabnl/Sign-Segmentation-with-Transformers
Detection of temporal boundaries in sign language videos, as part of the... |
|
Experimental |
| 53 |
anupvna/street-view-geolocation
Multi-view Deep Learning pipeline using PyTorch to predict global... |
|
Experimental |
| 54 |
HowieMa/PPT
[ECCV 2022] "PPT: token-Pruned Pose Transformer for monocular and multi-view... |
|
Experimental |
| 55 |
LookUpMark/dylem-grid
DYLEM-GRID is a deep learning project for dynamic hand gesture recognition... |
|
Experimental |
| 56 |
sauradip/fewshotQAT
[BMVC 2021]: Official PyTorch implementation of : "Few Shot Temporal Action... |
|
Experimental |
| 57 |
arafathosense/Real-Time-Face-Glitch-Effect-Controlled-by-Hand-Gestures
A real-time interactive computer vision art project using OpenCV. Control a... |
|
Experimental |
| 58 |
Abdullah-Shah-26/Sign-Cast
Real-time AI-powered voice-to-sign language translator. Converts speech to... |
|
Experimental |
| 59 |
freddxvill/Proyecto_Traductor_de_la_LSB
Traductor de Lengua de Señas Boliviana (LSB) a texto utilizando redes... |
|
Experimental |
| 60 |
exitudio/GaitMixer
Official repository for "GaitMixer: Skeleton-based Gait Representation... |
|
Experimental |
| 61 |
icon-lab/TranSMS
Official Implementation of Transformers for System Matrix Super-resolution (TranSMS) |
|
Experimental |
| 62 |
albrateanu/KANT
[Sensors 2025] Enhancing Low-Light Images with Kolmogorov–Arnold Networks in... |
|
Experimental |
| 63 |
musialski-lab/LayoutEnhancer
Source code for the Paper: Layout Enahancer |
|
Experimental |
| 64 |
mabdn/feasible-interpretable-trajectory-prediction
A Transformer neural network for autonomous driving to predict the future... |
|
Experimental |
| 65 |
artem-gorodetskii/TransPix2Pix
Rethinking the Pix2Pix architecture with attention mechanisms and transformers. |
|
Experimental |
| 66 |
AshutoshKulkarni4998/AIDTransformer
Inference code for "Aerial Image Dehazing with Attentive Deformable... |
|
Experimental |
| 67 |
rukmini-17/scalable-sequence-modeling
Comparative analysis of Mamba vs. Transformers trained from scratch.... |
|
Experimental |
| 68 |
mustafa1728/Person-Re-ID
Experiments on some existing Re-ID methods on a different dataset with... |
|
Experimental |
| 69 |
Suvroneel/Forma-3D-Vision-Engine
Converts 2D photos into 3D meshes using monocular depth estimation and... |
|
Experimental |
| 70 |
RisabBiswas/T2T-BinFormer
SOTA Document Image Enhancement - T2T-BinFormer: Effective Document Image... |
|
Experimental |
| 71 |
fabiosilva781/top-cvpr-2025-papers
🌟 Discover top CVPR 2025 papers for insightful research in computer vision,... |
|
Experimental |
| 72 |
Microsatellites-and-Space-Microsystems/pose_estimation_domain_gap
Two methods for solving domain gap in satellite pose estimation in space... |
|
Experimental |
| 73 |
gmongaras/2Mamba2Furious
Code for the paper "2Mamba2Furious: Linear in complexity, competitive in accuracy" |
|
Experimental |
| 74 |
miaodd98/ITrans-Generative-Image-Inpainting-with-Transformers-ChinaMM-2023-Multimedia-Systems
ITrans: Generative Image Inpainting with Transformers, ChinaMM 2023,... |
|
Experimental |
| 75 |
shayanamir0/Just-Image-Transformers
implementation of Just Image Transformer from the paper "Back to Basics: Let... |
|
Experimental |
| 76 |
tthinking/SETFusion
[PR 2026] Official implementation of SETFusion: A Semantic Transformer for... |
|
Experimental |
| 77 |
GregorKobsik/Octree-Transformer
Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically... |
|
Experimental |
| 78 |
zwh0527/AGRNet
Code for "Mining Global Relativity Consistency without Neighborhood Modeling... |
|
Experimental |
| 79 |
junayed-hasan/spontaneous-smile-recognition
A deep learning framework for distinguishing spontaneous from posed smiles... |
|
Experimental |
| 80 |
aliebayani/TransGAN-DX
A Hybrid Transformer-GAN Approach for Cardiovascular Disease Diagnosis |
|
Experimental |
| 81 |
botmahn/slowfast
An unofficial pytorch implementation of "Early Anticipation of Driving... |
|
Experimental |
| 82 |
n1ghtf4l1/decipher-engine
Detect and Translate American Sign Language (ASL) fingerspelling into text. |
|
Experimental |
| 83 |
codedmachine111/Image_generation_using_transformers_in_GANs
Image Generation using Transformers in GANs |
|
Experimental |