Multimodal Visual Grounding Computer Vision Tools

There are 6 multimodal visual grounding tools tracked. The highest-rated is peteanderson80/Matterport3DSimulator at 44/100 with 683 stars.

Get all 6 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=computer-vision&subcategory=multimodal-visual-grounding&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 peteanderson80/Matterport3DSimulator

AI Research Platform for Reinforcement Learning from Real Panoramic Images.

44
Emerging
2 daveredrum/ScanRefer

[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

34
Emerging
3 cambridgeltl/visual-spatial-reasoning

[TACL'23] VSR: A probing benchmark for spatial undersranding of...

31
Emerging
4 clairecyq/whos-waldo

Who's Waldo? Linking People Across Text and Images. ICCV 2021.

29
Experimental
5 TheShadow29/vognet-pytorch

[CVPR20] Video Object Grounding using Semantic Roles in Language Description...

28
Experimental
6 jianghaojun/Awesome-3D-Vision-and-Language

A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D...

25
Experimental