Multimodal Visual Grounding Computer Vision Tools
There are 6 multimodal visual grounding tools tracked. The highest-rated is peteanderson80/Matterport3DSimulator at 44/100 with 683 stars.
Get all 6 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=computer-vision&subcategory=multimodal-visual-grounding&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
peteanderson80/Matterport3DSimulator
AI Research Platform for Reinforcement Learning from Real Panoramic Images. |
|
Emerging |
| 2 |
daveredrum/ScanRefer
[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language |
|
Emerging |
| 3 |
cambridgeltl/visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of... |
|
Emerging |
| 4 |
clairecyq/whos-waldo
Who's Waldo? Linking People Across Text and Images. ICCV 2021. |
|
Experimental |
| 5 |
TheShadow29/vognet-pytorch
[CVPR20] Video Object Grounding using Semantic Roles in Language Description... |
|
Experimental |
| 6 |
jianghaojun/Awesome-3D-Vision-and-Language
A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D... |
|
Experimental |