Multimodal Vision Language Computer Vision Tools
There are 20 multimodal vision language tools tracked. The highest-rated is col14m/cadrille at 43/100 with 110 stars.
Get all 20 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=computer-vision&subcategory=multimodal-vision-language&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
col14m/cadrille
[ICLR2026] cadrille: Multi-modal CAD Reconstruction with Online... |
|
Emerging |
| 2 |
filaPro/cad-recode
[ICCV2025] CAD-Recode: Reverse Engineering CAD Code from Point Clouds |
|
Emerging |
| 3 |
pengsongyou/openscene
[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies |
|
Emerging |
| 4 |
cambrian-mllm/cambrian-s
Cambrian-S: Towards Spatial Supersensing in Video |
|
Emerging |
| 5 |
worldbench/3EED
[NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D |
|
Emerging |
| 6 |
Gorilla-Lab-SCUT/PaDT
[ICLR 2026] Official implementation of "Patch-as-Decodable-Token: Towards... |
|
Emerging |
| 7 |
InternLM/Spatial-SSRL
[CVPR 2026] Official release of "Spatial-SSRL: Enhancing Spatial... |
|
Emerging |
| 8 |
IDEA-Research/RexSeek
[ICCV2025] Referring any person or objects given a natural language... |
|
Emerging |
| 9 |
Haochen-Wang409/TreeVGR
[ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation... |
|
Emerging |
| 10 |
TimeBlindness/time-blindness
[CVPR 2026 🔥] Time Blindness: Why Video-Language Models Can't See What Humans Can? |
|
Emerging |
| 11 |
Davidyao99/uni4d
[CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a... |
|
Emerging |
| 12 |
bagh2178/UniGoal
[CVPR 2025] UniGoal: Towards Universal Zero-shot Goal-oriented Navigation |
|
Emerging |
| 13 |
ajzhai/NeRF2Physics
[CVPR 2024] Physical Property Understanding from Language-Embedded Feature Fields |
|
Emerging |
| 14 |
taco-group/SparkVSR
SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation |
|
Experimental |
| 15 |
Sid2697/HOI-Ref
Code implementation for paper titled "HOI-Ref: Hand-Object Interaction... |
|
Experimental |
| 16 |
Haochen-Wang409/ross3d
[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness |
|
Experimental |
| 17 |
Jiaxuan-Li/EVCap
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name... |
|
Experimental |
| 18 |
Hon-Wong/Elysium
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM |
|
Experimental |
| 19 |
sled-group/3D-GRAND
[CVPR 2025] 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs |
|
Experimental |
| 20 |
AIoT-MLSys-Lab/Famba-V
[ECCV 2024 Workshop Best Paper Award] Famba-V: Fast Vision Mamba with... |
|
Experimental |