Multimodal Vision Language Computer Vision Tools

There are 20 multimodal vision language tools tracked. The highest-rated is col14m/cadrille at 43/100 with 110 stars.

Get all 20 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=computer-vision&subcategory=multimodal-vision-language&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	col14m/cadrille [ICLR2026] cadrille: Multi-modal CAD Reconstruction with Online...	43	Emerging	110	Python
2	filaPro/cad-recode [ICCV2025] CAD-Recode: Reverse Engineering CAD Code from Point Clouds	41	Emerging	206	Jupyter Notebook
3	pengsongyou/openscene [CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies	36	Emerging	800	Python
4	cambrian-mllm/cambrian-s Cambrian-S: Towards Spatial Supersensing in Video	36	Emerging	507	Python
5	worldbench/3EED [NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D	36	Emerging	206	Python
6	Gorilla-Lab-SCUT/PaDT [ICLR 2026] Official implementation of "Patch-as-Decodable-Token: Towards...	36	Emerging	251	Python
7	InternLM/Spatial-SSRL [CVPR 2026] Official release of "Spatial-SSRL: Enhancing Spatial...	36	Emerging	116	Python
8	IDEA-Research/RexSeek [ICCV2025] Referring any person or objects given a natural language...	35	Emerging	177	Python
9	Haochen-Wang409/TreeVGR [ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation...	32	Emerging	77	Python
10	TimeBlindness/time-blindness [CVPR 2026 🔥] Time Blindness: Why Video-Language Models Can't See What Humans Can?	32	Emerging	62	Python
11	Davidyao99/uni4d [CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a...	32	Emerging	222	Python
12	bagh2178/UniGoal [CVPR 2025] UniGoal: Towards Universal Zero-shot Goal-oriented Navigation	31	Emerging	311	Python
13	ajzhai/NeRF2Physics [CVPR 2024] Physical Property Understanding from Language-Embedded Feature Fields	31	Emerging	89	Python
14	taco-group/SparkVSR SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation	29	Experimental	29	Python
15	Sid2697/HOI-Ref Code implementation for paper titled "HOI-Ref: Hand-Object Interaction...	25	Experimental	29	Python
16	Haochen-Wang409/ross3d [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness	22	Experimental	67	Python
17	Jiaxuan-Li/EVCap [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name...	21	Experimental	62	Python
18	Hon-Wong/Elysium [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM	17	Experimental	86	Python
19	sled-group/3D-GRAND [CVPR 2025] 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs	14	Experimental	53	—
20	AIoT-MLSys-Lab/Famba-V [ECCV 2024 Workshop Best Paper Award] Famba-V: Fast Vision Mamba with...	11	Experimental	34	Python