Uncategorized Multimodal AI Tools
There are 39 uncategorized tools tracked. 1 score above 70 (verified tier). The highest-rated is starVLA/starVLA at 71/100 with 1,702 stars. 3 of the top 10 are actively maintained.
Get all 39 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=multimodal&subcategory=uncategorized&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
starVLA/starVLA
StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing |
|
Verified |
| 2 |
vortex-data/vortex
An extensible, state-of-the-art framework for columnar compression, and the... |
|
Established |
| 3 |
motis-project/motis
multimodal routing, geocoding, and map tiles |
|
Established |
| 4 |
zai-org/GLM-V
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with... |
|
Established |
| 5 |
neka-nat/cad3dify
2D to 3D CAD Conversion Using VLM |
|
Established |
| 6 |
batmanlab/Mammo-CLIP
[MICCAI 2024, top 11%] Official Pytorch implementation of Mammo-CLIP: A... |
|
Established |
| 7 |
opendatalab/mineru-vl-utils
A Python package for interacting with the MinerU Vision-Language Model. |
|
Established |
| 8 |
EMob-Lab/MnMS
Agent-based Multimodal Urban Moblity Simulator resulting from the ERC MAGnUM project |
|
Established |
| 9 |
GerrySant/multimodalhugs
MultimodalHugs is an extension of Hugging Face that offers a generalized... |
|
Established |
| 10 |
withceleste/celeste-python
Open source, type-safe primitives for multi-modal AI. All modelities, all... |
|
Established |
| 11 |
cloudglue/cloudglue-js
Official JavaScript / TypeScript SDK for Cloudglue API |
|
Emerging |
| 12 |
EvolvingLMMs-Lab/LongVT
[CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling |
|
Emerging |
| 13 |
om-ai-lab/GroundVLP
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language... |
|
Emerging |
| 14 |
Jinfeng-Xu/Awesome-Multimodal-Recommender-Systems
[TMM'26] Continuously Updated Awesome Multimodal Recommendation Paper List |
|
Emerging |
| 15 |
anam-org/metaxy
Pluggable sample-level metadata versioning for incremental multimodal pipelines. |
|
Emerging |
| 16 |
eduardosanzb/escribano
AI-powered session intelligence tool - transcribes Cap recordings with Whisper |
|
Emerging |
| 17 |
yunncheng/MMRL
[CVPR 2025 & IJCV2026] Official PyTorch Code for "MMRL: Multi-Modal... |
|
Emerging |
| 18 |
Mellow-Artificial-Intelligence/open-xtract
Extract structured data from documents, images, audio, and video using LLMs. |
|
Emerging |
| 19 |
ComfyUI-Kelin/ComfyUI-LLMs-Toolkit
ComfyUI custom nodes for DeepSeek, Qwen, GPT, and other OpenAI-compatible... |
|
Emerging |
| 20 |
MING-ZCH/CII-Bench
[ACL 2025] Can MLLMs Understand the Deep Implication Behind Chinese Images? |
|
Emerging |
| 21 |
mturan33/isaac-g1-ulc
Low Level RL Controller for G1 |
|
Emerging |
| 22 |
Jinfeng-Xu/Multimodal-Recommendation-Library
A Continuously Updated Library for Advanced Models for Multimodal Recommendation |
|
Emerging |
| 23 |
nguyennm1024/OSCaR
🔥🔥🔥 Object State Description & Change Detection |
|
Emerging |
| 24 |
AZIRARM/nodify
Nodify is a powerful and flexible headless content management system (CMS)... |
|
Emerging |
| 25 |
winstxnhdw/telegroq
A serverless invite-only AI-powered chat bot on Telegram. |
|
Emerging |
| 26 |
ai-akashic/Memorose
Next-generation self-evolving multimodal memory brain. |
|
Emerging |
| 27 |
Henry-Who321/RAdaR
RAdaR is an RL-native adaptive reasoning framework for VLMs that dynamically... |
|
Experimental |
| 28 |
video-db/skills
Server-side video workflows for agents: ingest, understand, search, edit, stream. |
|
Experimental |
| 29 |
samletnorge/machine-core
A flexible agent framework for building AI agents with MCP (Model Context... |
|
Experimental |
| 30 |
Air00100/domain-normalizer
🌐 Normalize and parse domain names from messy input, cleaning errors and... |
|
Experimental |
| 31 |
microsoft/AsgardBench
Visually grounded planning benchmark for multimodal agents |
|
Experimental |
| 32 |
TLtanium/meta-lingo-electron
Meta-Lingo is a comprehensive desktop application designed for corpus... |
|
Experimental |
| 33 |
mturan33/isaac-g1-vlm
VLM-RL Hierarchical Loco-Manupilation For Long-Horizon Tasks With G1 robot... |
|
Experimental |
| 34 |
yc-cui/LLaRS
Multi-modal remote sensing image restoration and fusion foundation model... |
|
Experimental |
| 35 |
iLearn-Lab/ACMMM24-AD-DRL
The PyTorch implementation of AD-DRL |
|
Experimental |
| 36 |
Krisocer/FigureWeave
Generate editable scientific SVG figures from method text with local SAM3... |
|
Experimental |
| 37 |
wendell0218/Awesome-Motion-Datasets
A curated list of motion-related datasets |
|
Experimental |
| 38 |
Eganchiyu/Yuki-Chan-Bot
🌸 基于 DeepSeek-V3 的异步 AI 助手:集成“生物感”精力系统、双池 RAG 长效记忆与多模态视觉感知的电子妹妹 |
|
Experimental |
| 39 |
step-out/Multimodal-Model-Zoo
A curated collection of 100+ multimodal large language models |
|
Experimental |