lucidrains/vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

/ 100

Verified

Integrates multiple Vision Transformer variants (NaViT, CaiT, Token-to-Token, CrossFormer, MobileViT, etc.) with self-supervised learning methods like masked autoencoders and DINO, enabling flexible research across architecture improvements and training paradigms. Supports variable-resolution batch processing through NaViT, 3D video inputs via ViVit, and knowledge distillation from convolutional teachers, all with clean PyTorch APIs for customizing patch size, depth, and attention heads.

24,988 stars. Used by 2 other packages. Actively maintained with 2 commits in the last 30 days. Available on PyPI.

Maintenance 16 / 25

Adoption 12 / 25

Maturity 25 / 25

Community 22 / 25

How are scores calculated?

Stars

24,988

Forks

3,479

Language

Python

License

MIT

Category

defect-detection-quality-forensics

Last pushed

Mar 27, 2026

Commits (30d)

Dependencies

Reverse dependents

GitHub PyPI

Defect Detection Quality Forensics · 104 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/lucidrains/vit-pytorch"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Related tools

notAI-tech/NudeNet

Lightweight nudity detection

levan92/deep_sort_realtime

A really more real-time adaptation of deep sort

blakeblackshear/frigate

NVR with realtime local object detection for IP cameras

PaddlePaddle/PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance...

withoutbg/withoutbg

Image Background Removal Toolkit - Open Source and API Models

Explore Computer Vision Tools

All categories Trending Computer Vision directory Insights