lucidrains/vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Integrates multiple Vision Transformer variants (NaViT, CaiT, Token-to-Token, CrossFormer, MobileViT, etc.) with self-supervised learning methods like masked autoencoders and DINO, enabling flexible research across architecture improvements and training paradigms. Supports variable-resolution batch processing through NaViT, 3D video inputs via ViVit, and knowledge distillation from convolutional teachers, all with clean PyTorch APIs for customizing patch size, depth, and attention heads.
24,988 stars. Used by 2 other packages. Actively maintained with 2 commits in the last 30 days. Available on PyPI.
Stars
24,988
Forks
3,479
Language
Python
License
MIT
Category
Last pushed
Mar 27, 2026
Commits (30d)
2
Dependencies
3
Reverse dependents
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/lucidrains/vit-pytorch"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
notAI-tech/NudeNet
Lightweight nudity detection
levan92/deep_sort_realtime
A really more real-time adaptation of deep sort
blakeblackshear/frigate
NVR with realtime local object detection for IP cameras
PaddlePaddle/PaddleDetection
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance...
withoutbg/withoutbg
Image Background Removal Toolkit - Open Source and API Models