fundamentalvision/BEVFormer
[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
Employs spatiotemporal transformer architecture with spatial cross-attention across multi-camera views and temporal self-attention for recurrent BEV fusion, enabling unified representations for both 3D detection and map segmentation tasks. Built on MMDetection3D framework with multiple configuration variants (tiny to base) trading memory for performance, from 6.5GB to 28.5GB, and includes BEVFormerV2 with improved backbone integration reaching 56.9% NDS on nuScenes—9 points above prior camera-only methods.
4,356 stars. No commits in the last 6 months.
Stars
4,356
Forks
709
Language
Python
License
Apache-2.0
Category
Last pushed
Aug 15, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/fundamentalvision/BEVFormer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
changh95/visual-slam-roadmap
Roadmap to become a Visual-SLAM developer in 2026
coperception/coperception
An SDK for multi-agent collaborative perception.
w111liang222/lidar-slam-detection
LSD (LiDAR SLAM & Detection) is an open source perception architecture for autonomous vehicle/robotic
ika-rwth-aachen/Cam2BEV
TensorFlow Implementation for Computing a Semantically Segmented Bird's Eye View (BEV) Image...
adityamwagh/SuperSLAM
SuperSLAM: Open Source Framework for Deep Learning based Visual SLAM (Work in Progress)