peteanderson80/Matterport3DSimulator
AI Research Platform for Reinforcement Learning from Real Panoramic Images.
Renders agents within 90 real indoor environments from densely sampled 360° RGB-D panoramas, supporting both GPU (EGL) and CPU (OSMesa) off-screen rendering at ~1000 fps. Provides C++ and Python APIs with batched agent support, customizable camera parameters, and includes the Room-to-Room (R2R) navigation dataset for vision-and-language grounding tasks. Built on Matterport3D's real depth data rather than synthetic imagery, enabling research in embodied AI where agents follow natural language instructions through previously unseen buildings.
683 stars. No commits in the last 6 months.
Stars
683
Forks
138
Language
C++
License
—
Category
Last pushed
Jul 12, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/peteanderson80/Matterport3DSimulator"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
daveredrum/ScanRefer
[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
cambridgeltl/visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
TheShadow29/vognet-pytorch
[CVPR20] Video Object Grounding using Semantic Roles in Language Description...
jianghaojun/Awesome-3D-Vision-and-Language
A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D...
clairecyq/whos-waldo
Who's Waldo? Linking People Across Text and Images. ICCV 2021.