ViTAE-Transformer/ViTPose
The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
This tool helps researchers and engineers accurately identify and track human body keypoints in images and videos. You input visual media, and it outputs precise coordinates for body joints, enabling detailed analysis of human movement and posture. It's designed for professionals working in fields like sports science, animation, security, and healthcare.
1,957 stars.
Use this if you need to precisely locate and analyze human body poses from visual data for research, development, or application building.
Not ideal if your primary need is object detection or facial recognition, as this tool focuses specifically on human pose estimation.
Stars
1,957
Forks
243
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 25, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/ViTAE-Transformer/ViTPose"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
NVlabs/FasterViT
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with...
microsoft/CvT
This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.
BR-IDL/PaddleViT
:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+
gaohuang/MSDNet
Multi-Scale Dense Networks for Resource Efficient Image Classification (ICLR 2018 Oral)
pathak22/unsupervised-video
[CVPR 2017] Unsupervised deep learning using unlabelled videos on the web