fundamentalvision/BEVFormer

[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.

/ 100

Emerging

Employs spatiotemporal transformer architecture with spatial cross-attention across multi-camera views and temporal self-attention for recurrent BEV fusion, enabling unified representations for both 3D detection and map segmentation tasks. Built on MMDetection3D framework with multiple configuration variants (tiny to base) trading memory for performance, from 6.5GB to 28.5GB, and includes BEVFormerV2 with improved backbone integration reaching 56.9% NDS on nuScenes—9 points above prior camera-only methods.

4,356 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 23 / 25

How are scores calculated?

Stars

4,356

Forks

709

Language

Python

License

Apache-2.0

Related tools

changh95/visual-slam-roadmap

Roadmap to become a Visual-SLAM developer in 2026

coperception/coperception

An SDK for multi-agent collaborative perception.

w111liang222/lidar-slam-detection

LSD (LiDAR SLAM & Detection) is an open source perception architecture for autonomous vehicle/robotic

ika-rwth-aachen/Cam2BEV

TensorFlow Implementation for Computing a Semantically Segmented Bird's Eye View (BEV) Image...

adityamwagh/SuperSLAM

SuperSLAM: Open Source Framework for Deep Learning based Visual SLAM (Work in Progress)

Explore Computer Vision Tools

All categories Trending Computer Vision directory Insights