3D Vision Transformers Transformer Models

Tools for 3D computer vision tasks using transformers, including depth estimation, multi-view geometry, structure-from-motion, point cloud processing, 3D pose estimation, and novel view synthesis. Does NOT include general 2D vision tasks, 2D pose estimation, or 3D shape generation without vision inputs.

There are 83 3d vision transformers models tracked. 5 score above 50 (established tier). The highest-rated is NVlabs/MambaVision at 69/100 with 2,060 stars. 1 of the top 10 are actively maintained.

Get all 83 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=3d-vision-transformers&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	NVlabs/MambaVision [CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid...	69	Established	2,060	Python
2	sign-language-translator/sign-language-translator Python library & framework to build custom translators for the...	64	Established	329	Python
3	kyegomez/Jamba PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"	56	Established	208	Python
4	fashn-AI/fashn-human-parser Human parsing model for fashion and virtual try-on applications	55	Established	24	Python
5	autonomousvision/transfuser [PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for...	55	Established	1,516	Python
6	kyegomez/MultiModalMamba A novel implementation of fusing ViT with Mamba into a fast, agile, and high...	49	Emerging	465	Python
7	dali92002/DocEnTR DocEnTr: An end-to-end document image enhancement transformer - ICPR 2022	47	Emerging	186	Jupyter Notebook
8	buaacyw/MeshAnything [ICLR 2025] From anything to mesh like human artists. Official impl. of...	44	Emerging	2,272	Python
9	buaacyw/MeshAnythingV2 [ICCV 2025] From anything to mesh like human artists. Official impl. of...	44	Emerging	970	Python
10	linjieli222/HERO Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for...	44	Emerging	236	Python
11	wgcban/HyperTransformer [CVPR'22] HyperTransformer: A Textural and Spectral Feature Fusion...	41	Emerging	140	Python
12	PediaMedAI/AggPose [IJCAI 2022] Official PyTorch implementation of AggPose: Deep Aggregation...	40	Emerging	30	Python
13	AllenXiangX/SnowflakeNet (TPAMI 2023) Snowflake Point Deconvolution for Point Cloud Completion and...	40	Emerging	200	Python
14	padeler/PE-former 2D Human Pose estimation using transformers. Implementation in Pytorch	39	Emerging	34	Python
15	AyushExel/trolo An SDK for Transformers + YOLO and other SSD family models	39	Emerging	64	Jupyter Notebook
16	ChenRocks/UNITER Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt...	39	Emerging	800	Python
17	xingyizhou/GTR Global Tracking Transformers, CVPR 2022	38	Emerging	379	Python
18	hasanirtiza/PedesFormer-Transformer-Networks-For-Pedestrian-Detection Transformer Networks for Pedestrian Detection	38	Emerging	43	Python
19	icon-lab/SLATER Official implementation of the paper: Unsupervised MRI Reconstruction via...	38	Emerging	41	Python
20	jhcho99/CoFormer [CVPR'22] Official PyTorch Implementation of "Collaborative Transformers for...	37	Emerging	50	Python
21	desaixie/zeroverse Official code for NeurIPS 2024 paper LRM-Zero: Training Large Reconstruction...	37	Emerging	153	Python
22	csiro-robotics/HOTFormerLoc [IEEE/CVF CVPR 2025] Hierarchical Octree Transformer for Versatile Lidar...	36	Emerging	26	Python
23	cgtuebingen/ua3dscancomp Latent Uncertainty-Aware Multi-View SDF Scan Completion	36	Emerging	2	Python
24	yihongXU/TransCenter This is the official implementation of TransCenter (TPAMI). The code and...	36	Emerging	118	—
25	snktshrma/ngps_flight Global vision positioning system for UAVs in outdoor GNSS-denied environments	34	Emerging	11	C++
26	jhcho99/GSRTR [BMVC'21] Official PyTorch Implementation of "Grounded Situation Recognition...	33	Emerging	27	Python
27	kyegomez/AudioMamba Implementation of the paper: "Audio Mamba: Bidirectional State Space Model...	33	Emerging	14	Shell
28	XunshanMan/MVGFormer This is the official implementation of the work presented at CVPR 2024,...	32	Emerging	68	Python
29	zubair-irshad/NeRF-MAE [ECCV 2024] Pytorch code for our ECCV'24 paper NeRF-MAE: Masked AutoEncoders...	32	Emerging	104	Python
30	xmartlabs/spoter-embeddings Create embeddings from sign pose videos using Transformers	32	Emerging	32	Python
31	kyegomez/MambaDecoderBlock MambaDecoderBlock is a novel decoder architecture that replaces traditional...	31	Emerging	5	Python
32	VachanVY/Transfusion.torch PyTorch Implementation of Transfusion: Predict the Next Token and Diffuse...	30	Emerging	28	Python
33	hukenovs/slovo Slovo: Russian Sign Language Dataset and Models	30	Emerging	83	Python
34	eslambakr/LAR-Look-Around-and-Refer This is the official implementation for our paper;"LAR:Look Around and Refer".	29	Experimental	30	C++
35	tthinking/MATR [IEEE TIP 2022] Official implementation of MATR: Multimodal Medical Image...	29	Experimental	99	Python
36	sauradip/STALE [ECCV 2022] Official Pytorch Implementation of the paper : " Zero-Shot...	29	Experimental	113	Python
37	Warren-SJ/SLAM3R A study of the research paper SLAM3R:Real-Time Dense Scene Reconstruction...	28	Experimental	1	Python
38	DEV-D-GR8/SignSense This repository contains a transformer-based model for real-time American...	28	Experimental	12	Jupyter Notebook
39	sam575/axial-gan Code for "Simultaneous Face Hallucination and Translation for Thermal to...	28	Experimental	13	Python
40	kyegomez/VLM-Mamba We introduce VLM-Mamba, the first Vision-Language Model built entirely on...	26	Experimental	14	Python
41	kyegomez/SimpleMamba Implementation of a modular, high-performance, and simplistic mamba for...	26	Experimental	40	Python
42	AndrewBoessen/PerfectRep PerfectRep is a 3D pose estimation model tailored specifically for...	26	Experimental	7	Python
43	ShengcaiLiao/TransMatcher [NeurIPS 2021] TransMatcher: Deep Image Matching Through Transformers for...	25	Experimental	29	Python
44	Suvroneel/ToyKing A Python prototype that converts 2D photos or text prompts into 3D models...	24	Experimental	1	HTML
45	bhanuprathap2000/sign-language-recognition This repo contains the code for sign-language-recognition as part of our...	24	Experimental	3	Jupyter Notebook
46	Merterm/Modeling-Intensification-for-SLG Public repo for the paper: "Modeling Intensification for Sign Language...	24	Experimental	14	Python
47	NeurAI-Lab/MT-SfMLearner Official code for 'Transformers in Unsupervised Structure-from-Motion' and...	24	Experimental	14	Python
48	GregorKobsik/ImageTransformer This notebook shows a basic implementation of a transformer (decoder)...	23	Experimental	6	Jupyter Notebook
49	kyegomez/Simba A simpler Pytorch + Zeta Implementation of the paper: "SiMBA: Simplified...	23	Experimental	28	Python
50	xiuqhou/DAPE [AAAI2026] Official implementation of the paper "DAPE: Harmonizing...	23	Experimental	6	Python
51	lamm-mit/FieldCompleter GAN/convolutional and Transformer models to predict missing mechanical...	22	Experimental	20	Python
52	loubnabnl/Sign-Segmentation-with-Transformers Detection of temporal boundaries in sign language videos, as part of the...	22	Experimental	9	Python
53	anupvna/street-view-geolocation Multi-view Deep Learning pipeline using PyTorch to predict global...	22	Experimental	—	Jupyter Notebook
54	HowieMa/PPT [ECCV 2022] "PPT: token-Pruned Pose Transformer for monocular and multi-view...	21	Experimental	63	Python
55	LookUpMark/dylem-grid DYLEM-GRID is a deep learning project for dynamic hand gesture recognition...	20	Experimental	1	Jupyter Notebook
56	sauradip/fewshotQAT [BMVC 2021]: Official PyTorch implementation of : "Few Shot Temporal Action...	19	Experimental	20	Python
57	arafathosense/Real-Time-Face-Glitch-Effect-Controlled-by-Hand-Gestures A real-time interactive computer vision art project using OpenCV. Control a...	19	Experimental	—	Python
58	Abdullah-Shah-26/Sign-Cast Real-time AI-powered voice-to-sign language translator. Converts speech to...	19	Experimental	—	TypeScript
59	freddxvill/Proyecto_Traductor_de_la_LSB Traductor de Lengua de Señas Boliviana (LSB) a texto utilizando redes...	18	Experimental	—	Jupyter Notebook
60	exitudio/GaitMixer Official repository for "GaitMixer: Skeleton-based Gait Representation...	18	Experimental	26	Python
61	icon-lab/TranSMS Official Implementation of Transformers for System Matrix Super-resolution (TranSMS)	17	Experimental	4	Python
62	albrateanu/KANT [Sensors 2025] Enhancing Low-Light Images with Kolmogorov–Arnold Networks in...	16	Experimental	9	Python
63	musialski-lab/LayoutEnhancer Source code for the Paper: Layout Enahancer	16	Experimental	4	Python
64	mabdn/feasible-interpretable-trajectory-prediction A Transformer neural network for autonomous driving to predict the future...	15	Experimental	6	Python
65	artem-gorodetskii/TransPix2Pix Rethinking the Pix2Pix architecture with attention mechanisms and transformers.	15	Experimental	21	Python
66	AshutoshKulkarni4998/AIDTransformer Inference code for "Aerial Image Dehazing with Attentive Deformable...	15	Experimental	21	Python
67	rukmini-17/scalable-sequence-modeling Comparative analysis of Mamba vs. Transformers trained from scratch....	15	Experimental	—	Jupyter Notebook
68	mustafa1728/Person-Re-ID Experiments on some existing Re-ID methods on a different dataset with...	15	Experimental	1	Jupyter Notebook
69	Suvroneel/Forma-3D-Vision-Engine Converts 2D photos into 3D meshes using monocular depth estimation and...	15	Experimental	1	Python
70	RisabBiswas/T2T-BinFormer SOTA Document Image Enhancement - T2T-BinFormer: Effective Document Image...	14	Experimental	24	Python
71	fabiosilva781/top-cvpr-2025-papers 🌟 Discover top CVPR 2025 papers for insightful research in computer vision,...	14	Experimental	—	—
72	Microsatellites-and-Space-Microsystems/pose_estimation_domain_gap Two methods for solving domain gap in satellite pose estimation in space...	14	Experimental	9	Jupyter Notebook
73	gmongaras/2Mamba2Furious Code for the paper "2Mamba2Furious: Linear in complexity, competitive in accuracy"	14	Experimental	3	Jupyter Notebook
74	miaodd98/ITrans-Generative-Image-Inpainting-with-Transformers-ChinaMM-2023-Multimedia-Systems ITrans: Generative Image Inpainting with Transformers, ChinaMM 2023,...	13	Experimental	7	Python
75	shayanamir0/Just-Image-Transformers implementation of Just Image Transformer from the paper "Back to Basics: Let...	13	Experimental	2	Python
76	tthinking/SETFusion [PR 2026] Official implementation of SETFusion: A Semantic Transformer for...	12	Experimental	1	—
77	GregorKobsik/Octree-Transformer Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically...	12	Experimental	18	Python
78	zwh0527/AGRNet Code for "Mining Global Relativity Consistency without Neighborhood Modeling...	12	Experimental	3	Python
79	junayed-hasan/spontaneous-smile-recognition A deep learning framework for distinguishing spontaneous from posed smiles...	12	Experimental	3	Python
80	aliebayani/TransGAN-DX A Hybrid Transformer-GAN Approach for Cardiovascular Disease Diagnosis	12	Experimental	3	Python
81	botmahn/slowfast An unofficial pytorch implementation of "Early Anticipation of Driving...	11	Experimental	—	Python
82	n1ghtf4l1/decipher-engine Detect and Translate American Sign Language (ASL) fingerspelling into text.	10	Experimental	1	Jupyter Notebook
83	codedmachine111/Image_generation_using_transformers_in_GANs Image Generation using Transformers in GANs	10	Experimental	1	Python

Comparisons in this category

MeshAnything and MeshAnythingV2 (44 vs 44)