Multimodal Visual Grounding Computer Vision Tools

There are 6 multimodal visual grounding tools tracked. The highest-rated is peteanderson80/Matterport3DSimulator at 44/100 with 683 stars.

Get all 6 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=computer-vision&subcategory=multimodal-visual-grounding&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	peteanderson80/Matterport3DSimulator AI Research Platform for Reinforcement Learning from Real Panoramic Images.	44	Emerging	683	C++
2	daveredrum/ScanRefer [ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language	34	Emerging	295	Python
3	cambridgeltl/visual-spatial-reasoning [TACL'23] VSR: A probing benchmark for spatial undersranding of...	31	Emerging	140	Python
4	clairecyq/whos-waldo Who's Waldo? Linking People Across Text and Images. ICCV 2021.	29	Experimental	13	Python
5	TheShadow29/vognet-pytorch [CVPR20] Video Object Grounding using Semantic Roles in Language Description...	28	Experimental	69	Python
6	jianghaojun/Awesome-3D-Vision-and-Language A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D...	25	Experimental	101	—