Multimodal Vision Language Computer Vision Tools

There are 20 multimodal vision language tools tracked. The highest-rated is col14m/cadrille at 43/100 with 110 stars.

Get all 20 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=computer-vision&subcategory=multimodal-vision-language&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 col14m/cadrille

[ICLR2026] cadrille: Multi-modal CAD Reconstruction with Online...

43
Emerging
2 filaPro/cad-recode

[ICCV2025] CAD-Recode: Reverse Engineering CAD Code from Point Clouds

41
Emerging
3 pengsongyou/openscene

[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies

36
Emerging
4 cambrian-mllm/cambrian-s

Cambrian-S: Towards Spatial Supersensing in Video

36
Emerging
5 worldbench/3EED

[NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D

36
Emerging
6 Gorilla-Lab-SCUT/PaDT

[ICLR 2026] Official implementation of "Patch-as-Decodable-Token: Towards...

36
Emerging
7 InternLM/Spatial-SSRL

[CVPR 2026] Official release of "Spatial-SSRL: Enhancing Spatial...

36
Emerging
8 IDEA-Research/RexSeek

[ICCV2025] Referring any person or objects given a natural language...

35
Emerging
9 Haochen-Wang409/TreeVGR

[ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation...

32
Emerging
10 TimeBlindness/time-blindness

[CVPR 2026 🔥] Time Blindness: Why Video-Language Models Can't See What Humans Can?

32
Emerging
11 Davidyao99/uni4d

[CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a...

32
Emerging
12 bagh2178/UniGoal

[CVPR 2025] UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

31
Emerging
13 ajzhai/NeRF2Physics

[CVPR 2024] Physical Property Understanding from Language-Embedded Feature Fields

31
Emerging
14 taco-group/SparkVSR

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

29
Experimental
15 Sid2697/HOI-Ref

Code implementation for paper titled "HOI-Ref: Hand-Object Interaction...

25
Experimental
16 Haochen-Wang409/ross3d

[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

22
Experimental
17 Jiaxuan-Li/EVCap

[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name...

21
Experimental
18 Hon-Wong/Elysium

[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM

17
Experimental
19 sled-group/3D-GRAND

[CVPR 2025] 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs

14
Experimental
20 AIoT-MLSys-Lab/Famba-V

[ECCV 2024 Workshop Best Paper Award] Famba-V: Fast Vision Mamba with...

11
Experimental