lucidrains/vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

75
/ 100
Verified

Integrates multiple Vision Transformer variants (NaViT, CaiT, Token-to-Token, CrossFormer, MobileViT, etc.) with self-supervised learning methods like masked autoencoders and DINO, enabling flexible research across architecture improvements and training paradigms. Supports variable-resolution batch processing through NaViT, 3D video inputs via ViVit, and knowledge distillation from convolutional teachers, all with clean PyTorch APIs for customizing patch size, depth, and attention heads.

24,988 stars. Used by 2 other packages. Actively maintained with 2 commits in the last 30 days. Available on PyPI.

Maintenance 16 / 25
Adoption 12 / 25
Maturity 25 / 25
Community 22 / 25

How are scores calculated?

Stars

24,988

Forks

3,479

Language

Python

License

MIT

Last pushed

Mar 27, 2026

Commits (30d)

2

Dependencies

3

Reverse dependents

2

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/lucidrains/vit-pytorch"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.